SNP predition

Author: Bartosz Lewandowski

------------------------------------------------------------------------------------------------------------

Workflow

1. Understanding our data and quick review     
2. Preprocessing 
3. Building first neural network
4. Checking results 
5. Plots for our first neural network
6. Bulding second neural network
7. Checking results
8. Plots for second NN
9. Last NN using StratifiedKfold
10. Results
11. Summarizing results and choosing best neural network 

Import of necessary function

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sn
import pandas_profiling 
import tensorflow as tf
from sklearn.model_selection import train_test_split
from sklearn.model_selection import StratifiedKFold
from sklearn.metrics import confusion_matrix
from sklearn.metrics import classification_report
from keras.models import Sequential
from keras.layers import Dense
from keras.callbacks import EarlyStopping
from keras.callbacks import ModelCheckpoint
from sklearn.metrics import balanced_accuracy_score
from sklearn.metrics import matthews_corrcoef
from sklearn.metrics import roc_auc_score
from sklearn.impute import KNNImputer
from pandas_profiling import ProfileReport
from keras import backend as K
from keras.layers import Dropout
from sklearn.metrics import roc_curve,auc
/usr/local/lib/python3.7/dist-packages/tensorflow/python/framework/dtypes.py:516: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
/usr/local/lib/python3.7/dist-packages/tensorflow/python/framework/dtypes.py:517: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
/usr/local/lib/python3.7/dist-packages/tensorflow/python/framework/dtypes.py:518: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
/usr/local/lib/python3.7/dist-packages/tensorflow/python/framework/dtypes.py:519: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
/usr/local/lib/python3.7/dist-packages/tensorflow/python/framework/dtypes.py:520: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
/usr/local/lib/python3.7/dist-packages/tensorflow/python/framework/dtypes.py:525: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
/usr/local/lib/python3.7/dist-packages/tensorboard/compat/tensorflow_stub/dtypes.py:541: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
/usr/local/lib/python3.7/dist-packages/tensorboard/compat/tensorflow_stub/dtypes.py:542: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
/usr/local/lib/python3.7/dist-packages/tensorboard/compat/tensorflow_stub/dtypes.py:543: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
/usr/local/lib/python3.7/dist-packages/tensorboard/compat/tensorflow_stub/dtypes.py:544: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
/usr/local/lib/python3.7/dist-packages/tensorboard/compat/tensorflow_stub/dtypes.py:545: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
/usr/local/lib/python3.7/dist-packages/tensorboard/compat/tensorflow_stub/dtypes.py:550: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  np_resource = np.dtype([("resource", np.ubyte, 1)])
Using TensorFlow backend.

1. Data overview

Reading data

In [3]:
data = pd.read_csv('data/logreg.txt', sep=';', dtype={'genotype':'float32'})
In [3]:
data.head()
Out[3]:
genotype QUAL DP DP2 GQ CALL BEFORE1 BEFORE2 BEFORE3 BEHIND1 BEHIND2 BEHIND3
0 1.0 87093 74 50.0 63.0 3 C C A A A A
1 1.0 56419 64 9.0 100.0 3 A G G A C A
2 1.0 40180 68 48.0 NaN 2 A G G T A A
3 1.0 33677 57 48.0 3.0 3 C A C G G G
4 1.0 78396 51 50.0 21.0 5 A A T A A A

We are checking how many variables and observations we have. We can see that there are 12 variables, 1 dependant and 11 independant. The dependant variable is genotype.

In [4]:
data.shape
Out[4]:
(2294151, 12)

We are checking how many good and bad predictions we have.

In [5]:
good_prediction = data[data['genotype'] == 1].shape[0]
bad_prediction = data[data['genotype'] == 0].shape[0]
In [6]:
print(str(round(good_prediction * 100 / (good_prediction + bad_prediction), 2)) + '% good SNP')
print(str(round(bad_prediction * 100 / (good_prediction + bad_prediction), 2)) + '% bad SNP')
96.74% good SNP
3.26% bad SNP

Change genotype destingnation. We do that, becouse we want to search for 1 not for 0.

In [7]:
data['genotype'][data['genotype'] == 0.0] = 2.0
data['genotype'][data['genotype'] == 1.0] = 0.0
data['genotype'][data['genotype'] == 2.0] = 1.0
/usr/lib/python3/dist-packages/ipykernel_launcher.py:1: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  """Entry point for launching an IPython kernel.
/usr/lib/python3/dist-packages/ipykernel_launcher.py:2: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  
/usr/lib/python3/dist-packages/ipykernel_launcher.py:3: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  This is separate from the ipykernel package so we can avoid doing imports until

Becouse we read nucleotides as nucleotides triplets we merge those colmuns together as BEFORE and BEHIND

In [8]:
data['BEFORE'] = data['BEFORE1'] + data['BEFORE2'] + data['BEFORE3']
data['BEHIND'] = data['BEHIND1'] + data['BEHIND2'] + data['BEHIND3']
data = data.drop(['BEFORE1','BEFORE2','BEFORE3','BEHIND1','BEHIND2','BEHIND3'], axis=1)

We use pandas_profiling to generete informations about our data. It's the best and easy way to do it. We can make it step by step, but for stuff like that this is sufficient.

For a given dataset the pandas profiling package computes the following statistics:

1_T2iRcSpLLxXop7Naa4ln0g.png

In [9]:
data.profile_report()








Out[9]:

2. Preprocessing

2.1 Dealing with missing data.

Knowing that thare is missing data in variable DP2 and GQ we need to do something with that.

We have a few ways to deal with it :

  • Delete those variables from our dataset
  • Delete rows where is missing data
  • We can do an imputation with mean, median, KNN, or even linear regression
  • There is recommended way in deep learning to impute missing data with numer 0, or -9999, but when our variable is integer, or float it's not recommended then.

In our data we can see that in variable GQ there is 24.6% missing data, that's quite a lot. We can decide then to delete this variable from out dataset. In case of DP2 there is only 1% so we can make some imputation there. I decided to do it with mean, becouse KNN algorithm had some hardware problems.

In [10]:
data['DP2']=data['DP2'].fillna(data['DP2'].mean())

Deleting columns with missing data (GQ)

In [11]:
data = data.dropna(axis=1)

Change strings into vectors by one hot encoding method using pd.get_dummies.

We need to do that, becouse neural networks is not able to read strings using keras.

In [12]:
data = pd.get_dummies(data, prefix=['BEFORE', 'BEHIND'])

Now we can check how many variables we have now.

In [13]:
data.shape
Out[13]:
(2294151, 135)

Quick look at our data now. We have nucleotides triplets from before and after variables, if a given nucleotide trio was in row then there appears integer 1.

In [14]:
data.head()
Out[14]:
genotype QUAL DP DP2 CALL BEFORE_AAA BEFORE_AAC BEFORE_AAG BEFORE_AAT BEFORE_ACA ... BEHIND_TCG BEHIND_TCT BEHIND_TGA BEHIND_TGC BEHIND_TGG BEHIND_TGT BEHIND_TTA BEHIND_TTC BEHIND_TTG BEHIND_TTT
0 0.0 87093 74 50.0 3 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
1 0.0 56419 64 9.0 3 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
2 0.0 40180 68 48.0 2 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
3 0.0 33677 57 48.0 3 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
4 0.0 78396 51 50.0 5 0 0 0 1 0 ... 0 0 0 0 0 0 0 0 0 0

5 rows × 135 columns

2.2 Separation of data into X - features and Y - labels.

In [92]:
data_x = data.iloc[:, 1:]
data_y = data.iloc[:, :1]

2.3 Separation of data into test and train data. I decided to set test_size=0.1 becouse we have big data and I think that's enough.

In [93]:
x_train, x_test, y_train, y_test = train_test_split(data_x, data_y, stratify=data_y, test_size=0.1, random_state=10)

Check if there is good ratio good SNP to bad SNP:

In [94]:
y_train.genotype.value_counts()[0] / (y_train.genotype.value_counts()[1] + y_train.genotype.value_counts()[0])
Out[94]:
0.9674108299612298
In [95]:
y_test.genotype.value_counts()[0] / (y_test.genotype.value_counts()[1] + y_test.genotype.value_counts()[0])
Out[95]:
0.967412909300136

2.4 Standardization with Z-score method ( subtract the average from the values and divide by standard deviation )

In [96]:
def standardize(column):
    mean = x_train[column].mean()
    std = x_train[column].std()
    x_train.loc[:, column] = (x_train[column] - mean) / std
    x_test.loc[:, column] = (x_test[column] - mean) / std
In [97]:
standardize('QUAL')
standardize('DP')
standardize('DP2')
standardize('CALL')

2.5 From traning data we can separate validation data.

In [98]:
x_train, x_val, y_train, y_val = train_test_split(x_train, y_train, stratify=y_train, test_size=0.1, random_state=11)

3. Building first, simple neural network.

We check what is the size/shape of first hidden layer

In [99]:
input_shape = x_train.shape[1]
x_train.shape
Out[99]:
(1858261, 134)

I have made 3 functions to define manualy my own metric for neural network. It's just F1 score.

F1 score is defined as the harmonic mean between precision and recall.

F1 = 2 (precision recall) / (precision + recall)

In [23]:
def recall_m(y_true, y_pred):
        true_positives = K.sum(K.round(K.clip(y_true * y_pred, 0, 1)))
        possible_positives = K.sum(K.round(K.clip(y_true, 0, 1)))
        recall = true_positives / (possible_positives + K.epsilon())
        return recall

def precision_m(y_true, y_pred):
        true_positives = K.sum(K.round(K.clip(y_true * y_pred, 0, 1)))
        predicted_positives = K.sum(K.round(K.clip(y_pred, 0, 1)))
        precision = true_positives / (predicted_positives + K.epsilon())
        return precision
    
def f1_m(y_true, y_pred):
        precision = precision_m(y_true, y_pred)
        recall = recall_m(y_true, y_pred)
        return 2*((precision*recall)/(precision+recall+K.epsilon()))

3.1 Building our model.

There we define our model. I used activation functions relu, and sigmoid because those are the best for binary classification. Optimizer I choosed is adam, but there is a lot more. I tried SGM and RMSprop, but the best result gave adam.

In [24]:
model = Sequential()
model.add(Dense(512, input_shape=(134,), activation='relu'))
model.add(Dense(256, activation='relu'))
model.add(Dense(128, activation='relu'))
model.add(Dense(64, activation='relu'))
model.add(Dense(32, activation='relu'))
model.add(Dense(16, activation='relu'))
model.add(Dense(8, activation='relu'))
model.add(Dense(1, activation='sigmoid'))
    
model.compile(optimizer='adam', 
                  loss='binary_crossentropy', 
                  metrics=[f1_m])
WARNING:tensorflow:From /usr/local/lib/python3.7/dist-packages/tensorflow/python/ops/nn_impl.py:180: add_dispatch_support.<locals>.wrapper (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where

We can check params

In [25]:
model.summary()
Model: "sequential_1"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
dense_1 (Dense)              (None, 512)               69120     
_________________________________________________________________
dense_2 (Dense)              (None, 256)               131328    
_________________________________________________________________
dense_3 (Dense)              (None, 128)               32896     
_________________________________________________________________
dense_4 (Dense)              (None, 64)                8256      
_________________________________________________________________
dense_5 (Dense)              (None, 32)                2080      
_________________________________________________________________
dense_6 (Dense)              (None, 16)                528       
_________________________________________________________________
dense_7 (Dense)              (None, 8)                 136       
_________________________________________________________________
dense_8 (Dense)              (None, 1)                 9         
=================================================================
Total params: 244,353
Trainable params: 244,353
Non-trainable params: 0
_________________________________________________________________

3.2 Don't let your NN overfit

Ways to preventing our neural network from overfitting:

  • Early stopping
  • Dropout techinic
  • L1 and L2 regulations

I choosed in this case early stopping, it's simplest, and most common way.

This functions follow our NN and when F1 score in validation data will not imporve in 5 epochs in a row, then NN will stop.

In [26]:
callback_list = [
    EarlyStopping(
        monitor='f1_m',
        patience=5,
        mode='max'
    ),
    ModelCheckpoint(
        filepath='my_model.h5',
        monitor='f1_m',
        save_best_only=True,
        mode='max'
        
    )
]

3.3 Let's begin deep learning with our model.

In [27]:
history = model.fit(x_train, y_train, 
          batch_size=512, 
          epochs=200,
          callbacks=callback_list,
          validation_data=(x_val, y_val))
WARNING:tensorflow:From /home/bartek/.local/lib/python3.7/site-packages/keras/backend/tensorflow_backend.py:422: The name tf.global_variables is deprecated. Please use tf.compat.v1.global_variables instead.

Train on 1858261 samples, validate on 206474 samples
Epoch 1/200
1858261/1858261 [==============================] - 26s 14us/step - loss: 0.1015 - f1_m: 0.4327 - val_loss: 0.0939 - val_f1_m: 0.4750
Epoch 2/200
1858261/1858261 [==============================] - 26s 14us/step - loss: 0.0944 - f1_m: 0.4701 - val_loss: 0.0934 - val_f1_m: 0.4750
Epoch 3/200
1858261/1858261 [==============================] - 26s 14us/step - loss: 0.0933 - f1_m: 0.4704 - val_loss: 0.0930 - val_f1_m: 0.4752
Epoch 4/200
1858261/1858261 [==============================] - 26s 14us/step - loss: 0.0925 - f1_m: 0.4700 - val_loss: 0.0926 - val_f1_m: 0.4752
Epoch 5/200
1858261/1858261 [==============================] - 26s 14us/step - loss: 0.0916 - f1_m: 0.4720 - val_loss: 0.0924 - val_f1_m: 0.4751
Epoch 6/200
1858261/1858261 [==============================] - 26s 14us/step - loss: 0.0906 - f1_m: 0.4736 - val_loss: 0.0926 - val_f1_m: 0.4754
Epoch 7/200
1858261/1858261 [==============================] - 26s 14us/step - loss: 0.0896 - f1_m: 0.4749 - val_loss: 0.0930 - val_f1_m: 0.4764
Epoch 8/200
1858261/1858261 [==============================] - 26s 14us/step - loss: 0.0886 - f1_m: 0.4780 - val_loss: 0.0948 - val_f1_m: 0.4758
Epoch 9/200
1858261/1858261 [==============================] - 25s 14us/step - loss: 0.0875 - f1_m: 0.4838 - val_loss: 0.0941 - val_f1_m: 0.4815
Epoch 10/200
1858261/1858261 [==============================] - 26s 14us/step - loss: 0.0864 - f1_m: 0.4874 - val_loss: 0.0942 - val_f1_m: 0.4772
Epoch 11/200
1858261/1858261 [==============================] - 25s 14us/step - loss: 0.0852 - f1_m: 0.4945 - val_loss: 0.0949 - val_f1_m: 0.4785
Epoch 12/200
1858261/1858261 [==============================] - 25s 14us/step - loss: 0.0842 - f1_m: 0.4993 - val_loss: 0.0958 - val_f1_m: 0.4788
Epoch 13/200
1858261/1858261 [==============================] - 26s 14us/step - loss: 0.0831 - f1_m: 0.5053 - val_loss: 0.0969 - val_f1_m: 0.4786
Epoch 14/200
1858261/1858261 [==============================] - 26s 14us/step - loss: 0.0820 - f1_m: 0.5103 - val_loss: 0.0975 - val_f1_m: 0.4802
Epoch 15/200
1858261/1858261 [==============================] - 26s 14us/step - loss: 0.0810 - f1_m: 0.5129 - val_loss: 0.0984 - val_f1_m: 0.4797
Epoch 16/200
1858261/1858261 [==============================] - 26s 14us/step - loss: 0.0800 - f1_m: 0.5186 - val_loss: 0.0997 - val_f1_m: 0.4823
Epoch 17/200
1858261/1858261 [==============================] - 26s 14us/step - loss: 0.0790 - f1_m: 0.5232 - val_loss: 0.1019 - val_f1_m: 0.4789
Epoch 18/200
1858261/1858261 [==============================] - 26s 14us/step - loss: 0.0780 - f1_m: 0.5312 - val_loss: 0.1016 - val_f1_m: 0.4808
Epoch 19/200
1858261/1858261 [==============================] - 26s 14us/step - loss: 0.0771 - f1_m: 0.5333 - val_loss: 0.1038 - val_f1_m: 0.4776
Epoch 20/200
1858261/1858261 [==============================] - 26s 14us/step - loss: 0.0762 - f1_m: 0.5384 - val_loss: 0.1077 - val_f1_m: 0.4789
Epoch 21/200
1858261/1858261 [==============================] - 26s 14us/step - loss: 0.0753 - f1_m: 0.5439 - val_loss: 0.1073 - val_f1_m: 0.4786
Epoch 22/200
1858261/1858261 [==============================] - 26s 14us/step - loss: 0.0744 - f1_m: 0.5499 - val_loss: 0.1093 - val_f1_m: 0.4758
Epoch 23/200
1858261/1858261 [==============================] - 25s 14us/step - loss: 0.0735 - f1_m: 0.5530 - val_loss: 0.1100 - val_f1_m: 0.4763
Epoch 24/200
1858261/1858261 [==============================] - 26s 14us/step - loss: 0.0727 - f1_m: 0.5599 - val_loss: 0.1103 - val_f1_m: 0.4757
Epoch 25/200
1858261/1858261 [==============================] - 25s 14us/step - loss: 0.0720 - f1_m: 0.5628 - val_loss: 0.1135 - val_f1_m: 0.4773
Epoch 26/200
1858261/1858261 [==============================] - 25s 14us/step - loss: 0.0711 - f1_m: 0.5669 - val_loss: 0.1153 - val_f1_m: 0.4768
Epoch 27/200
1858261/1858261 [==============================] - 26s 14us/step - loss: 0.0702 - f1_m: 0.5720 - val_loss: 0.1189 - val_f1_m: 0.4743
Epoch 28/200
1858261/1858261 [==============================] - 26s 14us/step - loss: 0.0697 - f1_m: 0.5743 - val_loss: 0.1159 - val_f1_m: 0.4747
Epoch 29/200
1858261/1858261 [==============================] - 26s 14us/step - loss: 0.0689 - f1_m: 0.5784 - val_loss: 0.1180 - val_f1_m: 0.4749
Epoch 30/200
1858261/1858261 [==============================] - 26s 14us/step - loss: 0.0681 - f1_m: 0.5814 - val_loss: 0.1290 - val_f1_m: 0.4796
Epoch 31/200
1858261/1858261 [==============================] - 26s 14us/step - loss: 0.0675 - f1_m: 0.5857 - val_loss: 0.1278 - val_f1_m: 0.4733
Epoch 32/200
1858261/1858261 [==============================] - 26s 14us/step - loss: 0.0668 - f1_m: 0.5893 - val_loss: 0.1293 - val_f1_m: 0.4747
Epoch 33/200
1858261/1858261 [==============================] - 26s 14us/step - loss: 0.0662 - f1_m: 0.5907 - val_loss: 0.1298 - val_f1_m: 0.4695
Epoch 34/200
1858261/1858261 [==============================] - 26s 14us/step - loss: 0.0655 - f1_m: 0.5970 - val_loss: 0.1314 - val_f1_m: 0.4754
Epoch 35/200
1858261/1858261 [==============================] - 26s 14us/step - loss: 0.0650 - f1_m: 0.5985 - val_loss: 0.1358 - val_f1_m: 0.4719
Epoch 36/200
1858261/1858261 [==============================] - 26s 14us/step - loss: 0.0645 - f1_m: 0.6013 - val_loss: 0.1374 - val_f1_m: 0.4739
Epoch 37/200
1858261/1858261 [==============================] - 25s 14us/step - loss: 0.0639 - f1_m: 0.6034 - val_loss: 0.1345 - val_f1_m: 0.4733
Epoch 38/200
1858261/1858261 [==============================] - 26s 14us/step - loss: 0.0634 - f1_m: 0.6053 - val_loss: 0.1415 - val_f1_m: 0.4714
Epoch 39/200
1858261/1858261 [==============================] - 25s 14us/step - loss: 0.0628 - f1_m: 0.6079 - val_loss: 0.1401 - val_f1_m: 0.4710
Epoch 40/200
1858261/1858261 [==============================] - 26s 14us/step - loss: 0.0622 - f1_m: 0.6137 - val_loss: 0.1474 - val_f1_m: 0.4671
Epoch 41/200
1858261/1858261 [==============================] - 26s 14us/step - loss: 0.0616 - f1_m: 0.6155 - val_loss: 0.1484 - val_f1_m: 0.4705
Epoch 42/200
1858261/1858261 [==============================] - 26s 14us/step - loss: 0.0612 - f1_m: 0.6170 - val_loss: 0.1529 - val_f1_m: 0.4654
Epoch 43/200
1858261/1858261 [==============================] - 25s 14us/step - loss: 0.0606 - f1_m: 0.6186 - val_loss: 0.1512 - val_f1_m: 0.4679
Epoch 44/200
1858261/1858261 [==============================] - 26s 14us/step - loss: 0.0603 - f1_m: 0.6206 - val_loss: 0.1570 - val_f1_m: 0.4645
Epoch 45/200
1858261/1858261 [==============================] - 26s 14us/step - loss: 0.0597 - f1_m: 0.6240 - val_loss: 0.1577 - val_f1_m: 0.4673
Epoch 46/200
1858261/1858261 [==============================] - 26s 14us/step - loss: 0.0592 - f1_m: 0.6283 - val_loss: 0.1571 - val_f1_m: 0.4646
Epoch 47/200
1858261/1858261 [==============================] - 26s 14us/step - loss: 0.0588 - f1_m: 0.6295 - val_loss: 0.1523 - val_f1_m: 0.4619
Epoch 48/200
1858261/1858261 [==============================] - 26s 14us/step - loss: 0.0584 - f1_m: 0.6308 - val_loss: 0.1595 - val_f1_m: 0.4630
Epoch 49/200
1858261/1858261 [==============================] - 26s 14us/step - loss: 0.0580 - f1_m: 0.6326 - val_loss: 0.1647 - val_f1_m: 0.4653
Epoch 50/200
1858261/1858261 [==============================] - 26s 14us/step - loss: 0.0576 - f1_m: 0.6355 - val_loss: 0.1691 - val_f1_m: 0.4650
Epoch 51/200
1858261/1858261 [==============================] - 26s 14us/step - loss: 0.0572 - f1_m: 0.6379 - val_loss: 0.1700 - val_f1_m: 0.4665
Epoch 52/200
1858261/1858261 [==============================] - 26s 14us/step - loss: 0.0568 - f1_m: 0.6405 - val_loss: 0.1678 - val_f1_m: 0.4569
Epoch 53/200
1858261/1858261 [==============================] - 25s 14us/step - loss: 0.0564 - f1_m: 0.6411 - val_loss: 0.1772 - val_f1_m: 0.4625
Epoch 54/200
1858261/1858261 [==============================] - 26s 14us/step - loss: 0.0562 - f1_m: 0.6428 - val_loss: 0.1758 - val_f1_m: 0.4616
Epoch 55/200
1858261/1858261 [==============================] - 26s 14us/step - loss: 0.0556 - f1_m: 0.6464 - val_loss: 0.1805 - val_f1_m: 0.4617
Epoch 56/200
1858261/1858261 [==============================] - 25s 14us/step - loss: 0.0554 - f1_m: 0.6474 - val_loss: 0.1751 - val_f1_m: 0.4626
Epoch 57/200
1858261/1858261 [==============================] - 25s 14us/step - loss: 0.0548 - f1_m: 0.6506 - val_loss: 0.1817 - val_f1_m: 0.4649
Epoch 58/200
1858261/1858261 [==============================] - 25s 14us/step - loss: 0.0546 - f1_m: 0.6529 - val_loss: 0.1769 - val_f1_m: 0.4638
Epoch 59/200
1858261/1858261 [==============================] - 26s 14us/step - loss: 0.0543 - f1_m: 0.6546 - val_loss: 0.1826 - val_f1_m: 0.4617
Epoch 60/200
1858261/1858261 [==============================] - 26s 14us/step - loss: 0.0541 - f1_m: 0.6543 - val_loss: 0.1833 - val_f1_m: 0.4579
Epoch 61/200
1858261/1858261 [==============================] - 26s 14us/step - loss: 0.0535 - f1_m: 0.6576 - val_loss: 0.1851 - val_f1_m: 0.4631
Epoch 62/200
1858261/1858261 [==============================] - 26s 14us/step - loss: 0.0534 - f1_m: 0.6586 - val_loss: 0.1872 - val_f1_m: 0.4634
Epoch 63/200
1858261/1858261 [==============================] - 26s 14us/step - loss: 0.0529 - f1_m: 0.6609 - val_loss: 0.1894 - val_f1_m: 0.4587
Epoch 64/200
1858261/1858261 [==============================] - 26s 14us/step - loss: 0.0526 - f1_m: 0.6619 - val_loss: 0.1832 - val_f1_m: 0.4578
Epoch 65/200
1858261/1858261 [==============================] - 26s 14us/step - loss: 0.0522 - f1_m: 0.6642 - val_loss: 0.1998 - val_f1_m: 0.4569
Epoch 66/200
1858261/1858261 [==============================] - 26s 14us/step - loss: 0.0522 - f1_m: 0.6660 - val_loss: 0.1854 - val_f1_m: 0.4597
Epoch 67/200
1858261/1858261 [==============================] - 25s 14us/step - loss: 0.0517 - f1_m: 0.6690 - val_loss: 0.1911 - val_f1_m: 0.4623
Epoch 68/200
1858261/1858261 [==============================] - 26s 14us/step - loss: 0.0514 - f1_m: 0.6721 - val_loss: 0.1980 - val_f1_m: 0.4584
Epoch 69/200
1858261/1858261 [==============================] - 26s 14us/step - loss: 0.0511 - f1_m: 0.6702 - val_loss: 0.2039 - val_f1_m: 0.4597
Epoch 70/200
1858261/1858261 [==============================] - 26s 14us/step - loss: 0.0508 - f1_m: 0.6712 - val_loss: 0.2007 - val_f1_m: 0.4512
Epoch 71/200
1858261/1858261 [==============================] - 25s 14us/step - loss: 0.0504 - f1_m: 0.6760 - val_loss: 0.2029 - val_f1_m: 0.4567
Epoch 72/200
1858261/1858261 [==============================] - 25s 14us/step - loss: 0.0503 - f1_m: 0.6753 - val_loss: 0.2046 - val_f1_m: 0.4537
Epoch 73/200
1858261/1858261 [==============================] - 26s 14us/step - loss: 0.0499 - f1_m: 0.6796 - val_loss: 0.1991 - val_f1_m: 0.4560
Epoch 74/200
1858261/1858261 [==============================] - 26s 14us/step - loss: 0.0498 - f1_m: 0.6783 - val_loss: 0.2034 - val_f1_m: 0.4573
Epoch 75/200
1858261/1858261 [==============================] - 26s 14us/step - loss: 0.0495 - f1_m: 0.6808 - val_loss: 0.2053 - val_f1_m: 0.4575
Epoch 76/200
1858261/1858261 [==============================] - 25s 14us/step - loss: 0.0493 - f1_m: 0.6816 - val_loss: 0.2051 - val_f1_m: 0.4535
Epoch 77/200
1858261/1858261 [==============================] - 26s 14us/step - loss: 0.0490 - f1_m: 0.6829 - val_loss: 0.2208 - val_f1_m: 0.4546
Epoch 78/200
1858261/1858261 [==============================] - 26s 14us/step - loss: 0.0487 - f1_m: 0.6844 - val_loss: 0.2129 - val_f1_m: 0.4521
Epoch 79/200
1858261/1858261 [==============================] - 26s 14us/step - loss: 0.0483 - f1_m: 0.6888 - val_loss: 0.2267 - val_f1_m: 0.4578
Epoch 80/200
1858261/1858261 [==============================] - 26s 14us/step - loss: 0.0482 - f1_m: 0.6873 - val_loss: 0.2232 - val_f1_m: 0.4458
Epoch 81/200
1858261/1858261 [==============================] - 25s 14us/step - loss: 0.0480 - f1_m: 0.6873 - val_loss: 0.2394 - val_f1_m: 0.4525
Epoch 82/200
1858261/1858261 [==============================] - 26s 14us/step - loss: 0.0478 - f1_m: 0.6907 - val_loss: 0.2213 - val_f1_m: 0.4566
Epoch 83/200
1858261/1858261 [==============================] - 26s 14us/step - loss: 0.0475 - f1_m: 0.6919 - val_loss: 0.2177 - val_f1_m: 0.4534
Epoch 84/200
1858261/1858261 [==============================] - 25s 14us/step - loss: 0.0473 - f1_m: 0.6919 - val_loss: 0.2267 - val_f1_m: 0.4518
Epoch 85/200
1858261/1858261 [==============================] - 25s 14us/step - loss: 0.0470 - f1_m: 0.6941 - val_loss: 0.2337 - val_f1_m: 0.4547
Epoch 86/200
1858261/1858261 [==============================] - 25s 14us/step - loss: 0.0469 - f1_m: 0.6950 - val_loss: 0.2253 - val_f1_m: 0.4510
Epoch 87/200
1858261/1858261 [==============================] - 26s 14us/step - loss: 0.0465 - f1_m: 0.6980 - val_loss: 0.2412 - val_f1_m: 0.4516
Epoch 88/200
1858261/1858261 [==============================] - 26s 14us/step - loss: 0.0465 - f1_m: 0.6977 - val_loss: 0.2259 - val_f1_m: 0.4495
Epoch 89/200
1858261/1858261 [==============================] - 26s 14us/step - loss: 0.0462 - f1_m: 0.6976 - val_loss: 0.2259 - val_f1_m: 0.4473
Epoch 90/200
1858261/1858261 [==============================] - 26s 14us/step - loss: 0.0459 - f1_m: 0.7023 - val_loss: 0.2298 - val_f1_m: 0.4491
Epoch 91/200
1858261/1858261 [==============================] - 26s 14us/step - loss: 0.0457 - f1_m: 0.7019 - val_loss: 0.2375 - val_f1_m: 0.4490
Epoch 92/200
1858261/1858261 [==============================] - 26s 14us/step - loss: 0.0456 - f1_m: 0.7034 - val_loss: 0.2316 - val_f1_m: 0.4525
Epoch 93/200
1858261/1858261 [==============================] - 26s 14us/step - loss: 0.0455 - f1_m: 0.7054 - val_loss: 0.2458 - val_f1_m: 0.4506
Epoch 94/200
1858261/1858261 [==============================] - 25s 14us/step - loss: 0.0451 - f1_m: 0.7049 - val_loss: 0.2300 - val_f1_m: 0.4510
Epoch 95/200
1858261/1858261 [==============================] - 26s 14us/step - loss: 0.0449 - f1_m: 0.7077 - val_loss: 0.2285 - val_f1_m: 0.4513
Epoch 96/200
1858261/1858261 [==============================] - 26s 14us/step - loss: 0.0448 - f1_m: 0.7083 - val_loss: 0.2457 - val_f1_m: 0.4501
Epoch 97/200
1858261/1858261 [==============================] - 26s 14us/step - loss: 0.0447 - f1_m: 0.7089 - val_loss: 0.2405 - val_f1_m: 0.4507
Epoch 98/200
1858261/1858261 [==============================] - 26s 14us/step - loss: 0.0444 - f1_m: 0.7102 - val_loss: 0.2518 - val_f1_m: 0.4494
Epoch 99/200
1858261/1858261 [==============================] - 26s 14us/step - loss: 0.0440 - f1_m: 0.7123 - val_loss: 0.2459 - val_f1_m: 0.4440
Epoch 100/200
1858261/1858261 [==============================] - 26s 14us/step - loss: 0.0439 - f1_m: 0.7143 - val_loss: 0.2418 - val_f1_m: 0.4460
Epoch 101/200
1858261/1858261 [==============================] - 26s 14us/step - loss: 0.0438 - f1_m: 0.7159 - val_loss: 0.2436 - val_f1_m: 0.4458
Epoch 102/200
1858261/1858261 [==============================] - 26s 14us/step - loss: 0.0435 - f1_m: 0.7158 - val_loss: 0.2574 - val_f1_m: 0.4465
Epoch 103/200
1858261/1858261 [==============================] - 25s 14us/step - loss: 0.0435 - f1_m: 0.7148 - val_loss: 0.2541 - val_f1_m: 0.4490
Epoch 104/200
1858261/1858261 [==============================] - 25s 14us/step - loss: 0.0433 - f1_m: 0.7172 - val_loss: 0.2526 - val_f1_m: 0.4465
Epoch 105/200
1858261/1858261 [==============================] - 26s 14us/step - loss: 0.0432 - f1_m: 0.7182 - val_loss: 0.2398 - val_f1_m: 0.4521
Epoch 106/200
1858261/1858261 [==============================] - 26s 14us/step - loss: 0.0429 - f1_m: 0.7200 - val_loss: 0.2660 - val_f1_m: 0.4465
Epoch 107/200
1858261/1858261 [==============================] - 26s 14us/step - loss: 0.0429 - f1_m: 0.7194 - val_loss: 0.2553 - val_f1_m: 0.4487
Epoch 108/200
1858261/1858261 [==============================] - 26s 14us/step - loss: 0.0426 - f1_m: 0.7212 - val_loss: 0.2434 - val_f1_m: 0.4462
Epoch 109/200
1858261/1858261 [==============================] - 26s 14us/step - loss: 0.0424 - f1_m: 0.7231 - val_loss: 0.2770 - val_f1_m: 0.4480
Epoch 110/200
1858261/1858261 [==============================] - 26s 14us/step - loss: 0.0423 - f1_m: 0.7232 - val_loss: 0.2546 - val_f1_m: 0.4421
Epoch 111/200
1858261/1858261 [==============================] - 26s 14us/step - loss: 0.0422 - f1_m: 0.7242 - val_loss: 0.2778 - val_f1_m: 0.4451
Epoch 112/200
1858261/1858261 [==============================] - 26s 14us/step - loss: 0.0419 - f1_m: 0.7255 - val_loss: 0.2646 - val_f1_m: 0.4452
Epoch 113/200
1858261/1858261 [==============================] - 26s 14us/step - loss: 0.0418 - f1_m: 0.7256 - val_loss: 0.2645 - val_f1_m: 0.4497
Epoch 114/200
1858261/1858261 [==============================] - 25s 14us/step - loss: 0.0417 - f1_m: 0.7273 - val_loss: 0.2666 - val_f1_m: 0.4443
Epoch 115/200
1858261/1858261 [==============================] - 26s 14us/step - loss: 0.0415 - f1_m: 0.7273 - val_loss: 0.2787 - val_f1_m: 0.4385
Epoch 116/200
1858261/1858261 [==============================] - 26s 14us/step - loss: 0.0412 - f1_m: 0.7296 - val_loss: 0.2862 - val_f1_m: 0.4456
Epoch 117/200
1858261/1858261 [==============================] - 25s 14us/step - loss: 0.0411 - f1_m: 0.7311 - val_loss: 0.2809 - val_f1_m: 0.4395
Epoch 118/200
1858261/1858261 [==============================] - 25s 14us/step - loss: 0.0411 - f1_m: 0.7327 - val_loss: 0.2803 - val_f1_m: 0.4455
Epoch 119/200
1858261/1858261 [==============================] - 26s 14us/step - loss: 0.0408 - f1_m: 0.7334 - val_loss: 0.2877 - val_f1_m: 0.4410
Epoch 120/200
1858261/1858261 [==============================] - 26s 14us/step - loss: 0.0407 - f1_m: 0.7341 - val_loss: 0.2569 - val_f1_m: 0.4403
Epoch 121/200
1858261/1858261 [==============================] - 26s 14us/step - loss: 0.0405 - f1_m: 0.7353 - val_loss: 0.2714 - val_f1_m: 0.4392
Epoch 122/200
1858261/1858261 [==============================] - 26s 14us/step - loss: 0.0402 - f1_m: 0.7369 - val_loss: 0.2729 - val_f1_m: 0.4315
Epoch 123/200
1858261/1858261 [==============================] - 25s 14us/step - loss: 0.0404 - f1_m: 0.7374 - val_loss: 0.2557 - val_f1_m: 0.4343
Epoch 124/200
1858261/1858261 [==============================] - 26s 14us/step - loss: 0.0403 - f1_m: 0.7363 - val_loss: 0.2742 - val_f1_m: 0.4360
Epoch 125/200
1858261/1858261 [==============================] - 26s 14us/step - loss: 0.0400 - f1_m: 0.7386 - val_loss: 0.2823 - val_f1_m: 0.4415
Epoch 126/200
1858261/1858261 [==============================] - 25s 14us/step - loss: 0.0400 - f1_m: 0.7381 - val_loss: 0.2902 - val_f1_m: 0.4415
Epoch 127/200
1858261/1858261 [==============================] - 25s 14us/step - loss: 0.0397 - f1_m: 0.7398 - val_loss: 0.2642 - val_f1_m: 0.4324
Epoch 128/200
1858261/1858261 [==============================] - 26s 14us/step - loss: 0.0395 - f1_m: 0.7417 - val_loss: 0.2750 - val_f1_m: 0.4377
Epoch 129/200
1858261/1858261 [==============================] - 26s 14us/step - loss: 0.0393 - f1_m: 0.7434 - val_loss: 0.2871 - val_f1_m: 0.4317
Epoch 130/200
1858261/1858261 [==============================] - 26s 14us/step - loss: 0.0394 - f1_m: 0.7415 - val_loss: 0.2868 - val_f1_m: 0.4372
Epoch 131/200
1858261/1858261 [==============================] - 26s 14us/step - loss: 0.0393 - f1_m: 0.7432 - val_loss: 0.2786 - val_f1_m: 0.4410
Epoch 132/200
1858261/1858261 [==============================] - 25s 14us/step - loss: 0.0391 - f1_m: 0.7428 - val_loss: 0.2977 - val_f1_m: 0.4414
Epoch 133/200
1858261/1858261 [==============================] - 25s 14us/step - loss: 0.0390 - f1_m: 0.7443 - val_loss: 0.2863 - val_f1_m: 0.4410
Epoch 134/200
1858261/1858261 [==============================] - 26s 14us/step - loss: 0.0390 - f1_m: 0.7453 - val_loss: 0.2831 - val_f1_m: 0.4366
Epoch 135/200
1858261/1858261 [==============================] - 26s 14us/step - loss: 0.0386 - f1_m: 0.7470 - val_loss: 0.2777 - val_f1_m: 0.4312
Epoch 136/200
1858261/1858261 [==============================] - 26s 14us/step - loss: 0.0386 - f1_m: 0.7473 - val_loss: 0.2966 - val_f1_m: 0.4384
Epoch 137/200
1858261/1858261 [==============================] - 25s 14us/step - loss: 0.0385 - f1_m: 0.7471 - val_loss: 0.2763 - val_f1_m: 0.4334
Epoch 138/200
1858261/1858261 [==============================] - 26s 14us/step - loss: 0.0384 - f1_m: 0.7491 - val_loss: 0.2687 - val_f1_m: 0.4346
Epoch 139/200
1858261/1858261 [==============================] - 26s 14us/step - loss: 0.0382 - f1_m: 0.7504 - val_loss: 0.2956 - val_f1_m: 0.4392
Epoch 140/200
1858261/1858261 [==============================] - 26s 14us/step - loss: 0.0382 - f1_m: 0.7504 - val_loss: 0.2863 - val_f1_m: 0.4350
Epoch 141/200
1858261/1858261 [==============================] - 25s 14us/step - loss: 0.0380 - f1_m: 0.7500 - val_loss: 0.3104 - val_f1_m: 0.4335
Epoch 142/200
1858261/1858261 [==============================] - 25s 14us/step - loss: 0.0379 - f1_m: 0.7511 - val_loss: 0.2814 - val_f1_m: 0.4298
Epoch 143/200
1858261/1858261 [==============================] - 26s 14us/step - loss: 0.0378 - f1_m: 0.7525 - val_loss: 0.3003 - val_f1_m: 0.4333
Epoch 144/200
1858261/1858261 [==============================] - 26s 14us/step - loss: 0.0378 - f1_m: 0.7517 - val_loss: 0.2685 - val_f1_m: 0.4328
Epoch 145/200
1858261/1858261 [==============================] - 25s 14us/step - loss: 0.0377 - f1_m: 0.7530 - val_loss: 0.2630 - val_f1_m: 0.4296
Epoch 146/200
1858261/1858261 [==============================] - 25s 14us/step - loss: 0.0375 - f1_m: 0.7553 - val_loss: 0.2602 - val_f1_m: 0.4345
Epoch 147/200
1858261/1858261 [==============================] - 26s 14us/step - loss: 0.0376 - f1_m: 0.7547 - val_loss: 0.2884 - val_f1_m: 0.4339
Epoch 148/200
1858261/1858261 [==============================] - 26s 14us/step - loss: 0.0374 - f1_m: 0.7548 - val_loss: 0.2695 - val_f1_m: 0.4356
Epoch 149/200
1858261/1858261 [==============================] - 26s 14us/step - loss: 0.0371 - f1_m: 0.7569 - val_loss: 0.2978 - val_f1_m: 0.4384
Epoch 150/200
1858261/1858261 [==============================] - 25s 14us/step - loss: 0.0371 - f1_m: 0.7570 - val_loss: 0.3064 - val_f1_m: 0.4429
Epoch 151/200
1858261/1858261 [==============================] - 26s 14us/step - loss: 0.0370 - f1_m: 0.7576 - val_loss: 0.2869 - val_f1_m: 0.4353
Epoch 152/200
1858261/1858261 [==============================] - 25s 14us/step - loss: 0.0370 - f1_m: 0.7566 - val_loss: 0.3072 - val_f1_m: 0.4386
Epoch 153/200
1858261/1858261 [==============================] - 26s 14us/step - loss: 0.0368 - f1_m: 0.7588 - val_loss: 0.3125 - val_f1_m: 0.4363
Epoch 154/200
1858261/1858261 [==============================] - 26s 14us/step - loss: 0.0367 - f1_m: 0.7605 - val_loss: 0.3232 - val_f1_m: 0.4306
Epoch 155/200
1858261/1858261 [==============================] - 26s 14us/step - loss: 0.0367 - f1_m: 0.7603 - val_loss: 0.3059 - val_f1_m: 0.4299
Epoch 156/200
1858261/1858261 [==============================] - 25s 14us/step - loss: 0.0365 - f1_m: 0.7616 - val_loss: 0.3151 - val_f1_m: 0.4320
Epoch 157/200
1858261/1858261 [==============================] - 26s 14us/step - loss: 0.0366 - f1_m: 0.7603 - val_loss: 0.3046 - val_f1_m: 0.4316
Epoch 158/200
1858261/1858261 [==============================] - 26s 14us/step - loss: 0.0363 - f1_m: 0.7623 - val_loss: 0.3123 - val_f1_m: 0.4303
Epoch 159/200
1858261/1858261 [==============================] - 26s 14us/step - loss: 0.0363 - f1_m: 0.7626 - val_loss: 0.3077 - val_f1_m: 0.4354
Epoch 160/200
1858261/1858261 [==============================] - 26s 14us/step - loss: 0.0360 - f1_m: 0.7635 - val_loss: 0.3377 - val_f1_m: 0.4318
Epoch 161/200
1858261/1858261 [==============================] - 26s 14us/step - loss: 0.0361 - f1_m: 0.7622 - val_loss: 0.3057 - val_f1_m: 0.4306
Epoch 162/200
1858261/1858261 [==============================] - 26s 14us/step - loss: 0.0359 - f1_m: 0.7650 - val_loss: 0.3246 - val_f1_m: 0.4414
Epoch 163/200
1858261/1858261 [==============================] - 26s 14us/step - loss: 0.0358 - f1_m: 0.7672 - val_loss: 0.3141 - val_f1_m: 0.4322
Epoch 164/200
1858261/1858261 [==============================] - 26s 14us/step - loss: 0.0357 - f1_m: 0.7660 - val_loss: 0.3190 - val_f1_m: 0.4266
Epoch 165/200
1858261/1858261 [==============================] - 25s 14us/step - loss: 0.0357 - f1_m: 0.7662 - val_loss: 0.3466 - val_f1_m: 0.4337
Epoch 166/200
1858261/1858261 [==============================] - 26s 14us/step - loss: 0.0355 - f1_m: 0.7666 - val_loss: 0.3337 - val_f1_m: 0.4331
Epoch 167/200
1858261/1858261 [==============================] - 26s 14us/step - loss: 0.0356 - f1_m: 0.7675 - val_loss: 0.3369 - val_f1_m: 0.4295
Epoch 168/200
1858261/1858261 [==============================] - 26s 14us/step - loss: 0.0354 - f1_m: 0.7701 - val_loss: 0.3415 - val_f1_m: 0.4344
Epoch 169/200
1858261/1858261 [==============================] - 26s 14us/step - loss: 0.0352 - f1_m: 0.7704 - val_loss: 0.3307 - val_f1_m: 0.4326
Epoch 170/200
1858261/1858261 [==============================] - 26s 14us/step - loss: 0.0351 - f1_m: 0.7702 - val_loss: 0.3504 - val_f1_m: 0.4305
Epoch 171/200
1858261/1858261 [==============================] - 26s 14us/step - loss: 0.0351 - f1_m: 0.7704 - val_loss: 0.3319 - val_f1_m: 0.4276
Epoch 172/200
1858261/1858261 [==============================] - 26s 14us/step - loss: 0.0350 - f1_m: 0.7711 - val_loss: 0.3361 - val_f1_m: 0.4237
Epoch 173/200
1858261/1858261 [==============================] - 25s 14us/step - loss: 0.0349 - f1_m: 0.7711 - val_loss: 0.3168 - val_f1_m: 0.4330
Epoch 174/200
1858261/1858261 [==============================] - 26s 14us/step - loss: 0.0348 - f1_m: 0.7734 - val_loss: 0.3138 - val_f1_m: 0.4293
Epoch 175/200
1858261/1858261 [==============================] - 26s 14us/step - loss: 0.0346 - f1_m: 0.7727 - val_loss: 0.3204 - val_f1_m: 0.4344
Epoch 176/200
1858261/1858261 [==============================] - 26s 14us/step - loss: 0.0346 - f1_m: 0.7750 - val_loss: 0.3082 - val_f1_m: 0.4297
Epoch 177/200
1858261/1858261 [==============================] - 26s 14us/step - loss: 0.0346 - f1_m: 0.7729 - val_loss: 0.3444 - val_f1_m: 0.4382
Epoch 178/200
1858261/1858261 [==============================] - 25s 14us/step - loss: 0.0345 - f1_m: 0.7748 - val_loss: 0.3318 - val_f1_m: 0.4319
Epoch 179/200
1858261/1858261 [==============================] - 25s 14us/step - loss: 0.0345 - f1_m: 0.7766 - val_loss: 0.3134 - val_f1_m: 0.4328
Epoch 180/200
1858261/1858261 [==============================] - 26s 14us/step - loss: 0.0343 - f1_m: 0.7761 - val_loss: 0.3143 - val_f1_m: 0.4303
Epoch 181/200
1858261/1858261 [==============================] - 26s 14us/step - loss: 0.0343 - f1_m: 0.7762 - val_loss: 0.3376 - val_f1_m: 0.4264
Epoch 182/200
1858261/1858261 [==============================] - 26s 14us/step - loss: 0.0340 - f1_m: 0.7777 - val_loss: 0.3407 - val_f1_m: 0.4302
Epoch 183/200
1858261/1858261 [==============================] - 26s 14us/step - loss: 0.0340 - f1_m: 0.7767 - val_loss: 0.3221 - val_f1_m: 0.4269
Epoch 184/200
1858261/1858261 [==============================] - 25s 14us/step - loss: 0.0339 - f1_m: 0.7791 - val_loss: 0.3104 - val_f1_m: 0.4301
Epoch 185/200
1858261/1858261 [==============================] - 26s 14us/step - loss: 0.0340 - f1_m: 0.7794 - val_loss: 0.3190 - val_f1_m: 0.4257
Epoch 186/200
1858261/1858261 [==============================] - 26s 14us/step - loss: 0.0339 - f1_m: 0.7792 - val_loss: 0.3425 - val_f1_m: 0.4294
Epoch 187/200
1858261/1858261 [==============================] - 26s 14us/step - loss: 0.0336 - f1_m: 0.7802 - val_loss: 0.3156 - val_f1_m: 0.4260
Epoch 188/200
1858261/1858261 [==============================] - 26s 14us/step - loss: 0.0337 - f1_m: 0.7812 - val_loss: 0.3379 - val_f1_m: 0.4353
Epoch 189/200
1858261/1858261 [==============================] - 26s 14us/step - loss: 0.0336 - f1_m: 0.7812 - val_loss: 0.3347 - val_f1_m: 0.4289
Epoch 190/200
1858261/1858261 [==============================] - 26s 14us/step - loss: 0.0335 - f1_m: 0.7809 - val_loss: 0.3384 - val_f1_m: 0.4239
Epoch 191/200
1858261/1858261 [==============================] - 26s 14us/step - loss: 0.0334 - f1_m: 0.7812 - val_loss: 0.3367 - val_f1_m: 0.4247
Epoch 192/200
1858261/1858261 [==============================] - 26s 14us/step - loss: 0.0334 - f1_m: 0.7835 - val_loss: 0.3302 - val_f1_m: 0.4304
Epoch 193/200
1858261/1858261 [==============================] - 25s 14us/step - loss: 0.0335 - f1_m: 0.7808 - val_loss: 0.3031 - val_f1_m: 0.4316
Epoch 194/200
1858261/1858261 [==============================] - 26s 14us/step - loss: 0.0332 - f1_m: 0.7835 - val_loss: 0.3641 - val_f1_m: 0.4243
Epoch 195/200
1858261/1858261 [==============================] - 26s 14us/step - loss: 0.0332 - f1_m: 0.7824 - val_loss: 0.3692 - val_f1_m: 0.4289
Epoch 196/200
1858261/1858261 [==============================] - 26s 14us/step - loss: 0.0331 - f1_m: 0.7840 - val_loss: 0.3717 - val_f1_m: 0.4300
Epoch 197/200
1858261/1858261 [==============================] - 26s 14us/step - loss: 0.0330 - f1_m: 0.7838 - val_loss: 0.3630 - val_f1_m: 0.4215
Epoch 198/200
1858261/1858261 [==============================] - 25s 14us/step - loss: 0.0328 - f1_m: 0.7863 - val_loss: 0.3511 - val_f1_m: 0.4209
Epoch 199/200
1858261/1858261 [==============================] - 26s 14us/step - loss: 0.0329 - f1_m: 0.7859 - val_loss: 0.3310 - val_f1_m: 0.4213
Epoch 200/200
1858261/1858261 [==============================] - 26s 14us/step - loss: 0.0328 - f1_m: 0.7866 - val_loss: 0.3620 - val_f1_m: 0.4202

4. Results

4.1 Let's check first results

In [28]:
results = model.evaluate(x_test, y_test)
print('loss: ', results[0])
print('F1-score: ', results[1])
229416/229416 [==============================] - 4s 19us/step
loss:  0.37474001027753645
F1-score:  0.2359544038772583

4.2 Confusion matrix

In [29]:
y_pred = model.predict(x_test)
y_pred_class = model.predict_classes(x_test)
cm = confusion_matrix(y_test,y_pred_class)
cr = classification_report(y_test,y_pred_class)
In [30]:
ax= plt.subplot()
sn.heatmap(cm, annot=True, fmt='g', cmap="Blues", ax = ax)
ax.set_xlabel('True labels');ax.set_ylabel('Predicted labels') 
ax.set_title('Confusion Matrix')
ax.xaxis.set_ticklabels(['0', '1'])
ax.yaxis.set_ticklabels(['0', '1'])
Out[30]:
[Text(0, 0.5, '0'), Text(0, 1.5, '1')]

4.3 Sensitivity(Recall) and Precision

In [31]:
print(cr)

print('Precision:', str(round(cm[0][0] * 100 / (cm[0][0] + cm[1][0]), 2)) + '%')
print('Sensitivity:', str(round(cm[0][0] * 100 / (cm[0][0] + cm[0][1]),2)) + '%')
print('Specificity:', str(round((cm[1][1] *100 / (cm[1][1] + cm[0][1])),2)) + '%')
              precision    recall  f1-score   support

         0.0       0.98      0.99      0.98    221940
         1.0       0.48      0.37      0.42      7476

    accuracy                           0.97    229416
   macro avg       0.73      0.68      0.70    229416
weighted avg       0.96      0.97      0.96    229416

Precision: 97.89%
Sensitivity: 98.66%
Specificity: 48.0%

Sensitivity = True Positives / (True Positives + False Negatives)

Recall metric shows how many relevant samples are selected, which means how well our model can predict all the interested samples in our dataset.

Precision = True Positives / (True Positives + False Positives)

Precision metric tells us how many predicted samples are relevant i.e. our mistakes into classifying sample as a correct one if it’s not true.

Specificity = 1 - (False Positives / (False Positives + True Negatives))

Protability that for good SNP (0) classification will be correct (0)

4.4 Balanced accuracy score

The balanced_accuracy_score function computes the balanced accuracy, which avoids inflated performance estimates on imbalanced datasets. It is the macro-average of recall scores per class or, equivalently, raw accuracy where each sample is weighted according to the inverse prevalence of its true class. Thus for balanced datasets, the score is equal to accuracy. In the binary case, balanced accuracy is equal to the arithmetic mean of sensitivity (true positive rate) and specificity (true negative rate), or the area under the ROC curve with binary predictions rather than scores. [https://scikit-learn.org/ ]

In [32]:
balanced_accuracy_score(y_test, y_pred_class)
Out[32]:
0.6772752378666974

4.5 Matthews Correlation Coefficient

The Matthews correlation coefficient is used in machine learning as a measure of the quality of binary (two-class) classifications. It takes into account true and false positives and negatives and is generally regarded as a balanced measure which can be used even if the classes are of very different sizes. The MCC is in essence a correlation coefficient value between -1 and +1. A coefficient of +1 represents a perfect prediction, 0 an average random prediction and -1 an inverse prediction. The statistic is also known as the phi coefficient. In the binary case TP, TN, FP and FN are respectively the number of true positives, true negatives, false positives and false negatives, the MCC is defined as :

$ MCC = \frac{tp \times tn - fp \times fn}{\sqrt{(tp + fp)(tp + fn)(tn + fp)(tn + fn)}}. $

In [33]:
matthews_corrcoef(y_test, y_pred_class)
Out[33]:
0.40336375795068147

5. Plots

5.1 Plot show difference between increase F1 score in train and test data. We can see how F1 score was changing threw epochs, that's really cool and coomon plot, but not necessary.

In [34]:
plt.plot(history.history['f1_m'])
plt.plot(history.history['val_f1_m'])
plt.title('model F1 score')
plt.ylabel('F1')
plt.xlabel('epoch')
plt.legend(['train', 'test'], loc='upper left')
plt.show()

5.2 Plot - differance between decline in loss and validation loss.

In [35]:
plt.plot(history.history['loss'])
plt.plot(history.history['val_loss'])
plt.title('model loss')
plt.ylabel('loss')
plt.xlabel('epoch')
plt.legend(['train', 'test'], loc='upper left')
plt.show()

5.3 ROC PLOT.

We can see area under curve which value is the same as our balanced accuracy score. It tells us how good our NN is. The more convex, the better.

In [62]:
lw = 2
fpr, tpr, thresholds = roc_curve(y_test, y_pred)
roc_auc = auc(fpr, tpr)
def plot_roc_curve(fpr,tpr): 
  plt.plot(fpr,tpr,color='darkorange',
         lw=lw, label='ROC curve (area = %0.2f)' % roc_auc)
  plt.plot([0, 1], [0, 1], color='navy', lw=lw, linestyle='--')
  plt.axis([0,1,0,1]) 
  plt.xlabel('False Positive Rate') 
  plt.ylabel('True Positive Rate') 
  plt.legend(loc="lower right")
  plt.show()    
  
plot_roc_curve (fpr,tpr)

6. Second Neural Network with weights.

  • Large Weight: Assigned to examples from the minority class.
  • Small Weight: Assigned to examples from the majority class.

I used this time dropout method to prevent overfitting, becouse we could see that early stopping isn't really good with our data.

In [100]:
model = Sequential()
model.add(Dense(512, input_shape=(134,), activation='relu'))
model.add(Dropout(0.2))
model.add(Dense(256, activation='relu'))
model.add(Dropout(0.2))
model.add(Dense(128, activation='relu'))
model.add(Dropout(0.2))
model.add(Dense(64, activation='relu'))
model.add(Dropout(0.2))
model.add(Dense(32, activation='relu'))
model.add(Dropout(0.2))
model.add(Dense(16, activation='relu'))
model.add(Dense(1, activation='sigmoid'))

model.compile(optimizer='adam', 
                  loss='binary_crossentropy', 
                  metrics=[f1_m])
In [101]:
weights = {0:1, 1:5}
history = model.fit(x_train, y_train,
          class_weight=weights,
          batch_size=512, 
          epochs=100,
          validation_data=(x_val, y_val))
Train on 1858261 samples, validate on 206474 samples
Epoch 1/100
1858261/1858261 [==============================] - 42s 22us/step - loss: 0.3267 - f1_m: 0.4469 - val_loss: 0.1485 - val_f1_m: 0.4762
Epoch 2/100
1858261/1858261 [==============================] - 41s 22us/step - loss: 0.3104 - f1_m: 0.4632 - val_loss: 0.1376 - val_f1_m: 0.4857
Epoch 3/100
1858261/1858261 [==============================] - 41s 22us/step - loss: 0.3069 - f1_m: 0.4657 - val_loss: 0.1439 - val_f1_m: 0.4867
Epoch 4/100
1858261/1858261 [==============================] - 41s 22us/step - loss: 0.3051 - f1_m: 0.4680 - val_loss: 0.1495 - val_f1_m: 0.4883
Epoch 5/100
1858261/1858261 [==============================] - 41s 22us/step - loss: 0.3029 - f1_m: 0.4694 - val_loss: 0.1408 - val_f1_m: 0.4861
Epoch 6/100
1858261/1858261 [==============================] - 41s 22us/step - loss: 0.3017 - f1_m: 0.4709 - val_loss: 0.1470 - val_f1_m: 0.4816
Epoch 7/100
1858261/1858261 [==============================] - 42s 22us/step - loss: 0.2999 - f1_m: 0.4725 - val_loss: 0.1413 - val_f1_m: 0.4881
Epoch 8/100
1858261/1858261 [==============================] - 41s 22us/step - loss: 0.2984 - f1_m: 0.4740 - val_loss: 0.1448 - val_f1_m: 0.4833
Epoch 9/100
1858261/1858261 [==============================] - 41s 22us/step - loss: 0.2969 - f1_m: 0.4732 - val_loss: 0.1446 - val_f1_m: 0.4694
Epoch 10/100
1858261/1858261 [==============================] - 41s 22us/step - loss: 0.2952 - f1_m: 0.4742 - val_loss: 0.1387 - val_f1_m: 0.4863
Epoch 11/100
1858261/1858261 [==============================] - 41s 22us/step - loss: 0.2935 - f1_m: 0.4765 - val_loss: 0.1395 - val_f1_m: 0.4840
Epoch 12/100
1858261/1858261 [==============================] - 41s 22us/step - loss: 0.2920 - f1_m: 0.4764 - val_loss: 0.1405 - val_f1_m: 0.4847
Epoch 13/100
1858261/1858261 [==============================] - 42s 23us/step - loss: 0.2909 - f1_m: 0.4773 - val_loss: 0.1397 - val_f1_m: 0.4886
Epoch 14/100
1858261/1858261 [==============================] - 44s 23us/step - loss: 0.2896 - f1_m: 0.4762 - val_loss: 0.1405 - val_f1_m: 0.4761
Epoch 15/100
1858261/1858261 [==============================] - 44s 24us/step - loss: 0.2888 - f1_m: 0.4767 - val_loss: 0.1448 - val_f1_m: 0.4851
Epoch 16/100
1858261/1858261 [==============================] - 44s 24us/step - loss: 0.2874 - f1_m: 0.4774 - val_loss: 0.1417 - val_f1_m: 0.4774
Epoch 17/100
1858261/1858261 [==============================] - 44s 24us/step - loss: 0.2862 - f1_m: 0.4789 - val_loss: 0.1367 - val_f1_m: 0.4799
Epoch 18/100
1858261/1858261 [==============================] - 45s 24us/step - loss: 0.2854 - f1_m: 0.4778 - val_loss: 0.1396 - val_f1_m: 0.4785
Epoch 19/100
1858261/1858261 [==============================] - 44s 24us/step - loss: 0.2842 - f1_m: 0.4776 - val_loss: 0.1403 - val_f1_m: 0.4816
Epoch 20/100
1858261/1858261 [==============================] - 44s 24us/step - loss: 0.2834 - f1_m: 0.4767 - val_loss: 0.1353 - val_f1_m: 0.4857
Epoch 21/100
1858261/1858261 [==============================] - 45s 24us/step - loss: 0.2826 - f1_m: 0.4763 - val_loss: 0.1421 - val_f1_m: 0.4759
Epoch 22/100
1858261/1858261 [==============================] - 44s 24us/step - loss: 0.2820 - f1_m: 0.4781 - val_loss: 0.1479 - val_f1_m: 0.4852
Epoch 23/100
1858261/1858261 [==============================] - 44s 24us/step - loss: 0.2811 - f1_m: 0.4790 - val_loss: 0.1353 - val_f1_m: 0.4772
Epoch 24/100
1858261/1858261 [==============================] - 44s 24us/step - loss: 0.2801 - f1_m: 0.4790 - val_loss: 0.1431 - val_f1_m: 0.4797
Epoch 25/100
1858261/1858261 [==============================] - 44s 24us/step - loss: 0.2795 - f1_m: 0.4806 - val_loss: 0.1416 - val_f1_m: 0.4808
Epoch 26/100
1858261/1858261 [==============================] - 44s 24us/step - loss: 0.2785 - f1_m: 0.4806 - val_loss: 0.1377 - val_f1_m: 0.4729
Epoch 27/100
1858261/1858261 [==============================] - 45s 24us/step - loss: 0.2780 - f1_m: 0.4812 - val_loss: 0.1487 - val_f1_m: 0.4741
Epoch 28/100
1858261/1858261 [==============================] - 44s 24us/step - loss: 0.2772 - f1_m: 0.4816 - val_loss: 0.1405 - val_f1_m: 0.4749
Epoch 29/100
1858261/1858261 [==============================] - 45s 24us/step - loss: 0.2770 - f1_m: 0.4822 - val_loss: 0.1418 - val_f1_m: 0.4807
Epoch 30/100
1858261/1858261 [==============================] - 44s 24us/step - loss: 0.2761 - f1_m: 0.4817 - val_loss: 0.1419 - val_f1_m: 0.4765
Epoch 31/100
1858261/1858261 [==============================] - 44s 24us/step - loss: 0.2763 - f1_m: 0.4821 - val_loss: 0.1429 - val_f1_m: 0.4794
Epoch 32/100
1858261/1858261 [==============================] - 45s 24us/step - loss: 0.2754 - f1_m: 0.4825 - val_loss: 0.1373 - val_f1_m: 0.4733
Epoch 33/100
1858261/1858261 [==============================] - 45s 24us/step - loss: 0.2750 - f1_m: 0.4816 - val_loss: 0.1430 - val_f1_m: 0.4736
Epoch 34/100
1858261/1858261 [==============================] - 44s 24us/step - loss: 0.2744 - f1_m: 0.4806 - val_loss: 0.1427 - val_f1_m: 0.4802
Epoch 35/100
1858261/1858261 [==============================] - 45s 24us/step - loss: 0.2738 - f1_m: 0.4826 - val_loss: 0.1426 - val_f1_m: 0.4799
Epoch 36/100
1858261/1858261 [==============================] - 41s 22us/step - loss: 0.2732 - f1_m: 0.4840 - val_loss: 0.1421 - val_f1_m: 0.4790
Epoch 37/100
1858261/1858261 [==============================] - 42s 23us/step - loss: 0.2732 - f1_m: 0.4808 - val_loss: 0.1476 - val_f1_m: 0.4689
Epoch 38/100
1858261/1858261 [==============================] - 43s 23us/step - loss: 0.2725 - f1_m: 0.4815 - val_loss: 0.1450 - val_f1_m: 0.4731
Epoch 39/100
1858261/1858261 [==============================] - 41s 22us/step - loss: 0.2724 - f1_m: 0.4825 - val_loss: 0.1420 - val_f1_m: 0.4683
Epoch 40/100
1858261/1858261 [==============================] - 42s 22us/step - loss: 0.2715 - f1_m: 0.4818 - val_loss: 0.1348 - val_f1_m: 0.4819
Epoch 41/100
1858261/1858261 [==============================] - 42s 22us/step - loss: 0.2712 - f1_m: 0.4821 - val_loss: 0.1473 - val_f1_m: 0.4807
Epoch 42/100
1858261/1858261 [==============================] - 41s 22us/step - loss: 0.2709 - f1_m: 0.4826 - val_loss: 0.1447 - val_f1_m: 0.4827
Epoch 43/100
1858261/1858261 [==============================] - 41s 22us/step - loss: 0.2701 - f1_m: 0.4825 - val_loss: 0.1424 - val_f1_m: 0.4825
Epoch 44/100
1858261/1858261 [==============================] - 41s 22us/step - loss: 0.2694 - f1_m: 0.4825 - val_loss: 0.1412 - val_f1_m: 0.4818
Epoch 45/100
1858261/1858261 [==============================] - 41s 22us/step - loss: 0.2700 - f1_m: 0.4832 - val_loss: 0.1363 - val_f1_m: 0.4756
Epoch 46/100
1858261/1858261 [==============================] - 42s 22us/step - loss: 0.2690 - f1_m: 0.4819 - val_loss: 0.1486 - val_f1_m: 0.4813
Epoch 47/100
1858261/1858261 [==============================] - 41s 22us/step - loss: 0.2687 - f1_m: 0.4812 - val_loss: 0.1441 - val_f1_m: 0.4720
Epoch 48/100
1858261/1858261 [==============================] - 45s 24us/step - loss: 0.2682 - f1_m: 0.4834 - val_loss: 0.1437 - val_f1_m: 0.4748
Epoch 49/100
1858261/1858261 [==============================] - 43s 23us/step - loss: 0.2682 - f1_m: 0.4826 - val_loss: 0.1414 - val_f1_m: 0.4691
Epoch 50/100
1858261/1858261 [==============================] - 41s 22us/step - loss: 0.2677 - f1_m: 0.4854 - val_loss: 0.1467 - val_f1_m: 0.4814
Epoch 51/100
1858261/1858261 [==============================] - 41s 22us/step - loss: 0.2674 - f1_m: 0.4845 - val_loss: 0.1437 - val_f1_m: 0.4812
Epoch 52/100
1858261/1858261 [==============================] - 42s 22us/step - loss: 0.2680 - f1_m: 0.4841 - val_loss: 0.1435 - val_f1_m: 0.4780
Epoch 53/100
1858261/1858261 [==============================] - 41s 22us/step - loss: 0.2676 - f1_m: 0.4840 - val_loss: 0.1379 - val_f1_m: 0.4834
Epoch 54/100
1858261/1858261 [==============================] - 41s 22us/step - loss: 0.2668 - f1_m: 0.4838 - val_loss: 0.1408 - val_f1_m: 0.4846
Epoch 55/100
1858261/1858261 [==============================] - 42s 22us/step - loss: 0.2661 - f1_m: 0.4830 - val_loss: 0.1474 - val_f1_m: 0.4788
Epoch 56/100
1858261/1858261 [==============================] - 41s 22us/step - loss: 0.2660 - f1_m: 0.4873 - val_loss: 0.1381 - val_f1_m: 0.4753
Epoch 57/100
1858261/1858261 [==============================] - 41s 22us/step - loss: 0.2661 - f1_m: 0.4856 - val_loss: 0.1394 - val_f1_m: 0.4832
Epoch 58/100
1858261/1858261 [==============================] - 41s 22us/step - loss: 0.2660 - f1_m: 0.4849 - val_loss: 0.1513 - val_f1_m: 0.4767
Epoch 59/100
1858261/1858261 [==============================] - 41s 22us/step - loss: 0.2655 - f1_m: 0.4837 - val_loss: 0.1426 - val_f1_m: 0.4790
Epoch 60/100
1858261/1858261 [==============================] - 41s 22us/step - loss: 0.2650 - f1_m: 0.4828 - val_loss: 0.1521 - val_f1_m: 0.4705
Epoch 61/100
1858261/1858261 [==============================] - 41s 22us/step - loss: 0.2647 - f1_m: 0.4830 - val_loss: 0.1402 - val_f1_m: 0.4707
Epoch 62/100
1858261/1858261 [==============================] - 41s 22us/step - loss: 0.2643 - f1_m: 0.4842 - val_loss: 0.1435 - val_f1_m: 0.4777
Epoch 63/100
1858261/1858261 [==============================] - 41s 22us/step - loss: 0.2643 - f1_m: 0.4847 - val_loss: 0.1430 - val_f1_m: 0.4810
Epoch 64/100
1858261/1858261 [==============================] - 41s 22us/step - loss: 0.2647 - f1_m: 0.4852 - val_loss: 0.1480 - val_f1_m: 0.4779
Epoch 65/100
1858261/1858261 [==============================] - 41s 22us/step - loss: 0.2642 - f1_m: 0.4857 - val_loss: 0.1452 - val_f1_m: 0.4683
Epoch 66/100
1858261/1858261 [==============================] - 41s 22us/step - loss: 0.2636 - f1_m: 0.4846 - val_loss: 0.1437 - val_f1_m: 0.4816
Epoch 67/100
1858261/1858261 [==============================] - 41s 22us/step - loss: 0.2634 - f1_m: 0.4865 - val_loss: 0.1446 - val_f1_m: 0.4822
Epoch 68/100
1858261/1858261 [==============================] - 41s 22us/step - loss: 0.2638 - f1_m: 0.4845 - val_loss: 0.1370 - val_f1_m: 0.4832
Epoch 69/100
1858261/1858261 [==============================] - 41s 22us/step - loss: 0.2642 - f1_m: 0.4845 - val_loss: 0.1534 - val_f1_m: 0.4698
Epoch 70/100
1858261/1858261 [==============================] - 41s 22us/step - loss: 0.2631 - f1_m: 0.4846 - val_loss: 0.1493 - val_f1_m: 0.4675
Epoch 71/100
1858261/1858261 [==============================] - 41s 22us/step - loss: 0.2630 - f1_m: 0.4856 - val_loss: 0.1536 - val_f1_m: 0.4760
Epoch 72/100
1858261/1858261 [==============================] - 41s 22us/step - loss: 0.2625 - f1_m: 0.4878 - val_loss: 0.1424 - val_f1_m: 0.4832
Epoch 73/100
1858261/1858261 [==============================] - 41s 22us/step - loss: 0.2626 - f1_m: 0.4856 - val_loss: 0.1475 - val_f1_m: 0.4675
Epoch 74/100
1858261/1858261 [==============================] - 41s 22us/step - loss: 0.2620 - f1_m: 0.4879 - val_loss: 0.1498 - val_f1_m: 0.4790
Epoch 75/100
1858261/1858261 [==============================] - 41s 22us/step - loss: 0.2623 - f1_m: 0.4861 - val_loss: 0.1525 - val_f1_m: 0.4783
Epoch 76/100
1858261/1858261 [==============================] - 41s 22us/step - loss: 0.2625 - f1_m: 0.4867 - val_loss: 0.1493 - val_f1_m: 0.4790
Epoch 77/100
1858261/1858261 [==============================] - 41s 22us/step - loss: 0.2620 - f1_m: 0.4855 - val_loss: 0.1416 - val_f1_m: 0.4856
Epoch 78/100
1858261/1858261 [==============================] - 41s 22us/step - loss: 0.2619 - f1_m: 0.4865 - val_loss: 0.1477 - val_f1_m: 0.4706
Epoch 79/100
1858261/1858261 [==============================] - 41s 22us/step - loss: 0.2619 - f1_m: 0.4865 - val_loss: 0.1394 - val_f1_m: 0.4819
Epoch 80/100
1858261/1858261 [==============================] - 41s 22us/step - loss: 0.2617 - f1_m: 0.4868 - val_loss: 0.1535 - val_f1_m: 0.4819
Epoch 81/100
1858261/1858261 [==============================] - 41s 22us/step - loss: 0.2614 - f1_m: 0.4889 - val_loss: 0.1479 - val_f1_m: 0.4680
Epoch 82/100
1858261/1858261 [==============================] - 41s 22us/step - loss: 0.2611 - f1_m: 0.4882 - val_loss: 0.1508 - val_f1_m: 0.4795
Epoch 83/100
1858261/1858261 [==============================] - 41s 22us/step - loss: 0.2612 - f1_m: 0.4910 - val_loss: 0.1493 - val_f1_m: 0.4780
Epoch 84/100
1858261/1858261 [==============================] - 41s 22us/step - loss: 0.2606 - f1_m: 0.4891 - val_loss: 0.1515 - val_f1_m: 0.4717
Epoch 85/100
1858261/1858261 [==============================] - 41s 22us/step - loss: 0.2604 - f1_m: 0.4885 - val_loss: 0.1540 - val_f1_m: 0.4789
Epoch 86/100
1858261/1858261 [==============================] - 41s 22us/step - loss: 0.2605 - f1_m: 0.4894 - val_loss: 0.1507 - val_f1_m: 0.4763
Epoch 87/100
1858261/1858261 [==============================] - 41s 22us/step - loss: 0.2608 - f1_m: 0.4880 - val_loss: 0.1432 - val_f1_m: 0.4795
Epoch 88/100
1858261/1858261 [==============================] - 41s 22us/step - loss: 0.2601 - f1_m: 0.4890 - val_loss: 0.1476 - val_f1_m: 0.4664
Epoch 89/100
1858261/1858261 [==============================] - 41s 22us/step - loss: 0.2595 - f1_m: 0.4885 - val_loss: 0.1433 - val_f1_m: 0.4793
Epoch 90/100
1858261/1858261 [==============================] - 41s 22us/step - loss: 0.2603 - f1_m: 0.4872 - val_loss: 0.1485 - val_f1_m: 0.4804
Epoch 91/100
1858261/1858261 [==============================] - 41s 22us/step - loss: 0.2596 - f1_m: 0.4880 - val_loss: 0.1462 - val_f1_m: 0.4766
Epoch 92/100
1858261/1858261 [==============================] - 41s 22us/step - loss: 0.2592 - f1_m: 0.4890 - val_loss: 0.1492 - val_f1_m: 0.4686
Epoch 93/100
1858261/1858261 [==============================] - 41s 22us/step - loss: 0.2595 - f1_m: 0.4904 - val_loss: 0.1491 - val_f1_m: 0.4803
Epoch 94/100
1858261/1858261 [==============================] - 41s 22us/step - loss: 0.2591 - f1_m: 0.4893 - val_loss: 0.1547 - val_f1_m: 0.4734
Epoch 95/100
1858261/1858261 [==============================] - 41s 22us/step - loss: 0.2587 - f1_m: 0.4880 - val_loss: 0.1464 - val_f1_m: 0.4676
Epoch 96/100
1858261/1858261 [==============================] - 41s 22us/step - loss: 0.2597 - f1_m: 0.4870 - val_loss: 0.1432 - val_f1_m: 0.4835
Epoch 97/100
1858261/1858261 [==============================] - 42s 22us/step - loss: 0.2581 - f1_m: 0.4870 - val_loss: 0.1436 - val_f1_m: 0.4831
Epoch 98/100
1858261/1858261 [==============================] - 41s 22us/step - loss: 0.2586 - f1_m: 0.4903 - val_loss: 0.1548 - val_f1_m: 0.4713
Epoch 99/100
1858261/1858261 [==============================] - 41s 22us/step - loss: 0.2575 - f1_m: 0.4901 - val_loss: 0.1526 - val_f1_m: 0.4772
Epoch 100/100
1858261/1858261 [==============================] - 41s 22us/step - loss: 0.2583 - f1_m: 0.4850 - val_loss: 0.1497 - val_f1_m: 0.4774

7. Results

In [102]:
results = model.evaluate(x_test, y_test)
print('loss: ', results[0])
print('F1-score: ', results[1])
229416/229416 [==============================] - 8s 34us/step
loss:  0.15041480135174815
F1-score:  0.2547178268432617

7.1 Confusion matrix

In [103]:
y_pred = model.predict(x_test)
y_pred_class = model.predict_classes(x_test)
cm = confusion_matrix(y_test,y_pred_class)
cr = classification_report(y_test,y_pred_class)
ax= plt.subplot()
sn.heatmap(cm, annot=True, fmt='g', cmap="Blues", ax = ax)
ax.set_xlabel('True labels');ax.set_ylabel('Predicted labels') 
ax.set_title('Confusion Matrix')
ax.xaxis.set_ticklabels(['0', '1'])
ax.yaxis.set_ticklabels(['0', '1'])
Out[103]:
[Text(0, 0.5, '0'), Text(0, 1.5, '1')]

7.2 Other results, same as earlier

In [116]:
print(cr)

print('Precision:', str(round(cm[0][0] * 100 / (cm[0][0] + cm[1][0]), 2)) + '%')
print('Sensitivity(Recall):', str(round(cm[0][0] * 100 / (cm[0][0] + cm[0][1]),2)) + '%')
print('Specificity:', str(round((cm[1][1] *100 / (cm[1][1] + cm[0][1])),2)) + '%')
              precision    recall  f1-score   support

         0.0       0.98      0.98      0.98    369897
         1.0       0.39      0.47      0.43     12461

    accuracy                           0.96    382358
   macro avg       0.69      0.72      0.70    382358
weighted avg       0.96      0.96      0.96    382358

Precision: 98.21%
Sensitivity(Recall): 97.57%
Specificity: 39.49%

7.2.1 Balanced Accuracy score

In [105]:
balanced_accuracy_score(y_test, y_pred_class)
Out[105]:
0.6878334107912554

7.2.2 MCC

In [106]:
matthews_corrcoef(y_test, y_pred_class)
Out[106]:
0.4608475698463885

8. Plots

8.1 F1 score threw epochs

In [107]:
plt.plot(history.history['f1_m'])
plt.plot(history.history['val_f1_m'])
plt.title('model F1 score')
plt.ylabel('F1')
plt.xlabel('epoch')
plt.legend(['train', 'test'], loc='upper left')
plt.show()

8.2 Loss threw epochs

In [108]:
plt.plot(history.history['loss'])
plt.plot(history.history['val_loss'])
plt.title('model loss')
plt.ylabel('loss')
plt.xlabel('epoch')
plt.legend(['train', 'test'], loc='upper left')
plt.show()

8.3 ROC plot

In [109]:
lw = 2
fpr, tpr, thresholds = roc_curve(y_test, y_pred)
roc_auc = auc(fpr, tpr)   
plot_roc_curve (fpr,tpr)

9. Last Neural Network with StratifiedKFold.

In nutshell StratifedKFold take different part of our data K times for test data and every time there will be same percentage of genotype = 1 in every data. The data will be distributed equally.

I made there a ROC PLOT too, becouse that was easy way to create it and show all fold's curve.

In [110]:
skf = StratifiedKFold(n_splits=6, shuffle=True, random_state=3)
models_results = []
mean_tpr = 0.0
mean_fpr = np.linspace(0, 1, 100)
all_tpr = []
i = 1
for train_index, test_index in skf.split(data_x, data_y):
    x_train, x_test = data_x.iloc[train_index], data_x.iloc[test_index]
    y_train, y_test = data_y.iloc[train_index], data_y.iloc[test_index]
    
    standardize('QUAL')
    standardize('DP')
    standardize('CALL')
    standardize('DP2')
    
    x_train, x_val, y_train, y_val = train_test_split(x_train, y_train, stratify=y_train, test_size=0.1, random_state=11)
    
    model = Sequential()
    model.add(Dense(512, input_shape=(134,), activation='relu'))
    model.add(Dropout(0.2))
    model.add(Dense(256, activation='relu'))
    model.add(Dropout(0.2))
    model.add(Dense(128, activation='relu'))
    model.add(Dropout(0.2))
    model.add(Dense(64, activation='relu'))
    model.add(Dropout(0.2))
    model.add(Dense(32, activation='relu'))
    model.add(Dense(16, activation='relu'))
    model.add(Dense(8, activation='relu'))
    model.add(Dense(1, activation='sigmoid'))

    model.compile(optimizer='adam', 
                      loss='binary_crossentropy', 
                      metrics=[f1_m])
    weights = {0:1, 1:7.5}
    history = model.fit(x_train, y_train,
              class_weight=weights,
              batch_size=512, 
              epochs=25,
              validation_data=(x_val, y_val))
    
    loss_and_f1 = model.evaluate(x_test, y_test)
    y_pred = model.predict(x_test)
    y_pred_class = model.predict_classes(x_test)
    cm = confusion_matrix(y_test,y_pred_class)
    cr = classification_report(y_test,y_pred_class)
    results = (loss_and_f1[0], loss_and_f1[1], cm, cr, y_pred_class, y_test)
    models_results.append(results)
    
    fpr, tpr, thresholds = roc_curve(y_test, y_pred)
    mean_tpr += np.interp(mean_fpr, fpr, tpr)
    mean_tpr[0] = 0.0
    roc_auc = auc(fpr, tpr)
    plt.plot(fpr, tpr, lw=1, label='ROC fold %d (area = %0.2f)' % (i, roc_auc))
    i+=1
plt.plot([0, 1], [0, 1], '--', color=(0.6, 0.6, 0.6), label='Luck')

mean_tpr /= 6
mean_tpr[-1] = 1.0
mean_auc = auc(mean_fpr, mean_tpr)
plt.plot(mean_fpr, mean_tpr, 'k--',
        label='Mean ROC (area = %0.2f)' % mean_auc, lw=2)
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.title('Receiver operating characteristic (ROC) with cross validation ')
plt.legend(loc="lower right")
plt.show()
Train on 1720612 samples, validate on 191180 samples
Epoch 1/25
1720612/1720612 [==============================] - 38s 22us/step - loss: 0.4268 - f1_m: 0.4179 - val_loss: 0.1879 - val_f1_m: 0.4134
Epoch 2/25
1720612/1720612 [==============================] - 37s 22us/step - loss: 0.4053 - f1_m: 0.4323 - val_loss: 0.1683 - val_f1_m: 0.4401
Epoch 3/25
1720612/1720612 [==============================] - 37s 22us/step - loss: 0.4009 - f1_m: 0.4334 - val_loss: 0.1796 - val_f1_m: 0.4423
Epoch 4/25
1720612/1720612 [==============================] - 38s 22us/step - loss: 0.3978 - f1_m: 0.4359 - val_loss: 0.1855 - val_f1_m: 0.4355
Epoch 5/25
1720612/1720612 [==============================] - 37s 22us/step - loss: 0.3950 - f1_m: 0.4349 - val_loss: 0.1659 - val_f1_m: 0.4365
Epoch 6/25
1720612/1720612 [==============================] - 37s 22us/step - loss: 0.3921 - f1_m: 0.4344 - val_loss: 0.1713 - val_f1_m: 0.4334
Epoch 7/25
1720612/1720612 [==============================] - 38s 22us/step - loss: 0.3896 - f1_m: 0.4390 - val_loss: 0.1776 - val_f1_m: 0.4449
Epoch 8/25
1720612/1720612 [==============================] - 37s 22us/step - loss: 0.3868 - f1_m: 0.4385 - val_loss: 0.1745 - val_f1_m: 0.4268
Epoch 9/25
1720612/1720612 [==============================] - 37s 22us/step - loss: 0.3841 - f1_m: 0.4339 - val_loss: 0.1731 - val_f1_m: 0.4422
Epoch 10/25
1720612/1720612 [==============================] - 37s 22us/step - loss: 0.3816 - f1_m: 0.4344 - val_loss: 0.1704 - val_f1_m: 0.4334
Epoch 11/25
1720612/1720612 [==============================] - 38s 22us/step - loss: 0.3789 - f1_m: 0.4356 - val_loss: 0.1745 - val_f1_m: 0.4279
Epoch 12/25
1720612/1720612 [==============================] - 37s 22us/step - loss: 0.3767 - f1_m: 0.4368 - val_loss: 0.1688 - val_f1_m: 0.4391
Epoch 13/25
1720612/1720612 [==============================] - 37s 22us/step - loss: 0.3744 - f1_m: 0.4377 - val_loss: 0.1716 - val_f1_m: 0.4450
Epoch 14/25
1720612/1720612 [==============================] - 38s 22us/step - loss: 0.3722 - f1_m: 0.4392 - val_loss: 0.1665 - val_f1_m: 0.4310
Epoch 15/25
1720612/1720612 [==============================] - 37s 22us/step - loss: 0.3701 - f1_m: 0.4402 - val_loss: 0.1782 - val_f1_m: 0.4373
Epoch 16/25
1720612/1720612 [==============================] - 37s 22us/step - loss: 0.3678 - f1_m: 0.4408 - val_loss: 0.1862 - val_f1_m: 0.4407
Epoch 17/25
1720612/1720612 [==============================] - 38s 22us/step - loss: 0.3664 - f1_m: 0.4418 - val_loss: 0.1829 - val_f1_m: 0.4288
Epoch 18/25
1720612/1720612 [==============================] - 37s 22us/step - loss: 0.3643 - f1_m: 0.4424 - val_loss: 0.1821 - val_f1_m: 0.4284
Epoch 19/25
1720612/1720612 [==============================] - 37s 22us/step - loss: 0.3626 - f1_m: 0.4423 - val_loss: 0.1795 - val_f1_m: 0.4248
Epoch 20/25
1720612/1720612 [==============================] - 38s 22us/step - loss: 0.3607 - f1_m: 0.4419 - val_loss: 0.1611 - val_f1_m: 0.4424
Epoch 21/25
1720612/1720612 [==============================] - 37s 22us/step - loss: 0.3600 - f1_m: 0.4437 - val_loss: 0.1848 - val_f1_m: 0.4337
Epoch 22/25
1720612/1720612 [==============================] - 37s 22us/step - loss: 0.3585 - f1_m: 0.4429 - val_loss: 0.1650 - val_f1_m: 0.4372
Epoch 23/25
1720612/1720612 [==============================] - 38s 22us/step - loss: 0.3574 - f1_m: 0.4431 - val_loss: 0.1860 - val_f1_m: 0.4104
Epoch 24/25
1720612/1720612 [==============================] - 38s 22us/step - loss: 0.3560 - f1_m: 0.4462 - val_loss: 0.1819 - val_f1_m: 0.4143
Epoch 25/25
1720612/1720612 [==============================] - 38s 22us/step - loss: 0.3547 - f1_m: 0.4460 - val_loss: 0.1788 - val_f1_m: 0.4261
382359/382359 [==============================] - 13s 35us/step
Train on 1720612 samples, validate on 191180 samples
Epoch 1/25
1720612/1720612 [==============================] - 39s 22us/step - loss: 0.4273 - f1_m: 0.4074 - val_loss: 0.2038 - val_f1_m: 0.4372
Epoch 2/25
1720612/1720612 [==============================] - 38s 22us/step - loss: 0.4056 - f1_m: 0.4267 - val_loss: 0.1858 - val_f1_m: 0.4301
Epoch 3/25
1720612/1720612 [==============================] - 38s 22us/step - loss: 0.4019 - f1_m: 0.4290 - val_loss: 0.1902 - val_f1_m: 0.4197
Epoch 4/25
1720612/1720612 [==============================] - 38s 22us/step - loss: 0.3986 - f1_m: 0.4297 - val_loss: 0.1804 - val_f1_m: 0.4478
Epoch 5/25
1720612/1720612 [==============================] - 38s 22us/step - loss: 0.3960 - f1_m: 0.4330 - val_loss: 0.1765 - val_f1_m: 0.4290
Epoch 6/25
1720612/1720612 [==============================] - 38s 22us/step - loss: 0.3931 - f1_m: 0.4350 - val_loss: 0.1625 - val_f1_m: 0.4521
Epoch 7/25
1720612/1720612 [==============================] - 38s 22us/step - loss: 0.3907 - f1_m: 0.4329 - val_loss: 0.1677 - val_f1_m: 0.4492
Epoch 8/25
1720612/1720612 [==============================] - 38s 22us/step - loss: 0.3882 - f1_m: 0.4340 - val_loss: 0.1621 - val_f1_m: 0.4496
Epoch 9/25
1720612/1720612 [==============================] - 38s 22us/step - loss: 0.3852 - f1_m: 0.4331 - val_loss: 0.1641 - val_f1_m: 0.4324
Epoch 10/25
1720612/1720612 [==============================] - 38s 22us/step - loss: 0.3829 - f1_m: 0.4346 - val_loss: 0.1741 - val_f1_m: 0.4185
Epoch 11/25
1720612/1720612 [==============================] - 38s 22us/step - loss: 0.3807 - f1_m: 0.4344 - val_loss: 0.1778 - val_f1_m: 0.4297
Epoch 12/25
1720612/1720612 [==============================] - 38s 22us/step - loss: 0.3784 - f1_m: 0.4372 - val_loss: 0.1795 - val_f1_m: 0.4097
Epoch 13/25
1720612/1720612 [==============================] - 38s 22us/step - loss: 0.3756 - f1_m: 0.4381 - val_loss: 0.1766 - val_f1_m: 0.4249
Epoch 14/25
1720612/1720612 [==============================] - 38s 22us/step - loss: 0.3735 - f1_m: 0.4393 - val_loss: 0.1848 - val_f1_m: 0.4250
Epoch 15/25
1720612/1720612 [==============================] - 38s 22us/step - loss: 0.3719 - f1_m: 0.4420 - val_loss: 0.1641 - val_f1_m: 0.4334
Epoch 16/25
1720612/1720612 [==============================] - 38s 22us/step - loss: 0.3697 - f1_m: 0.4397 - val_loss: 0.1637 - val_f1_m: 0.4347
Epoch 17/25
1720612/1720612 [==============================] - 38s 22us/step - loss: 0.3673 - f1_m: 0.4407 - val_loss: 0.1739 - val_f1_m: 0.4196
Epoch 18/25
1720612/1720612 [==============================] - 38s 22us/step - loss: 0.3661 - f1_m: 0.4389 - val_loss: 0.1712 - val_f1_m: 0.4235
Epoch 19/25
1720612/1720612 [==============================] - 38s 22us/step - loss: 0.3645 - f1_m: 0.4404 - val_loss: 0.1653 - val_f1_m: 0.4205
Epoch 20/25
1720612/1720612 [==============================] - 38s 22us/step - loss: 0.3636 - f1_m: 0.4404 - val_loss: 0.1683 - val_f1_m: 0.4325
Epoch 21/25
1720612/1720612 [==============================] - 38s 22us/step - loss: 0.3625 - f1_m: 0.4443 - val_loss: 0.1752 - val_f1_m: 0.4265
Epoch 22/25
1720612/1720612 [==============================] - 38s 22us/step - loss: 0.3612 - f1_m: 0.4434 - val_loss: 0.1784 - val_f1_m: 0.4179
Epoch 23/25
1720612/1720612 [==============================] - 38s 22us/step - loss: 0.3593 - f1_m: 0.4451 - val_loss: 0.1710 - val_f1_m: 0.4344
Epoch 24/25
1720612/1720612 [==============================] - 38s 22us/step - loss: 0.3582 - f1_m: 0.4482 - val_loss: 0.1742 - val_f1_m: 0.4209
Epoch 25/25
1720612/1720612 [==============================] - 38s 22us/step - loss: 0.3567 - f1_m: 0.4468 - val_loss: 0.1750 - val_f1_m: 0.4273
382359/382359 [==============================] - 14s 37us/step
Train on 1720612 samples, validate on 191180 samples
Epoch 1/25
1720612/1720612 [==============================] - 39s 23us/step - loss: 0.4249 - f1_m: 0.4172 - val_loss: 0.1870 - val_f1_m: 0.4456
Epoch 2/25
1720612/1720612 [==============================] - 38s 22us/step - loss: 0.4057 - f1_m: 0.4294 - val_loss: 0.1794 - val_f1_m: 0.4179
Epoch 3/25
1720612/1720612 [==============================] - 38s 22us/step - loss: 0.4017 - f1_m: 0.4304 - val_loss: 0.1644 - val_f1_m: 0.4558
Epoch 4/25
1720612/1720612 [==============================] - 38s 22us/step - loss: 0.3991 - f1_m: 0.4305 - val_loss: 0.1786 - val_f1_m: 0.4342
Epoch 5/25
1720612/1720612 [==============================] - 38s 22us/step - loss: 0.3964 - f1_m: 0.4313 - val_loss: 0.1823 - val_f1_m: 0.4392
Epoch 6/25
1720612/1720612 [==============================] - 38s 22us/step - loss: 0.3938 - f1_m: 0.4357 - val_loss: 0.1691 - val_f1_m: 0.4241
Epoch 7/25
1720612/1720612 [==============================] - 38s 22us/step - loss: 0.3916 - f1_m: 0.4350 - val_loss: 0.1662 - val_f1_m: 0.4440
Epoch 8/25
1720612/1720612 [==============================] - 38s 22us/step - loss: 0.3895 - f1_m: 0.4392 - val_loss: 0.1781 - val_f1_m: 0.4292
Epoch 9/25
1720612/1720612 [==============================] - 38s 22us/step - loss: 0.3863 - f1_m: 0.4379 - val_loss: 0.1821 - val_f1_m: 0.4144
Epoch 10/25
1720612/1720612 [==============================] - 38s 22us/step - loss: 0.3842 - f1_m: 0.4387 - val_loss: 0.1676 - val_f1_m: 0.4312
Epoch 11/25
1720612/1720612 [==============================] - 38s 22us/step - loss: 0.3814 - f1_m: 0.4372 - val_loss: 0.1714 - val_f1_m: 0.4289
Epoch 12/25
1720612/1720612 [==============================] - 38s 22us/step - loss: 0.3796 - f1_m: 0.4377 - val_loss: 0.1691 - val_f1_m: 0.4247
Epoch 13/25
1720612/1720612 [==============================] - 38s 22us/step - loss: 0.3776 - f1_m: 0.4421 - val_loss: 0.1732 - val_f1_m: 0.4240
Epoch 14/25
1720612/1720612 [==============================] - 38s 22us/step - loss: 0.3751 - f1_m: 0.4422 - val_loss: 0.1677 - val_f1_m: 0.4348
Epoch 15/25
1720612/1720612 [==============================] - 38s 22us/step - loss: 0.3734 - f1_m: 0.4433 - val_loss: 0.1708 - val_f1_m: 0.4251
Epoch 16/25
1720612/1720612 [==============================] - 38s 22us/step - loss: 0.3717 - f1_m: 0.4412 - val_loss: 0.1598 - val_f1_m: 0.4415
Epoch 17/25
1720612/1720612 [==============================] - 38s 22us/step - loss: 0.3698 - f1_m: 0.4447 - val_loss: 0.1570 - val_f1_m: 0.4265
Epoch 18/25
1720612/1720612 [==============================] - 38s 22us/step - loss: 0.3681 - f1_m: 0.4454 - val_loss: 0.1773 - val_f1_m: 0.4093
Epoch 19/25
1720612/1720612 [==============================] - 38s 22us/step - loss: 0.3665 - f1_m: 0.4442 - val_loss: 0.1711 - val_f1_m: 0.4346
Epoch 20/25
1720612/1720612 [==============================] - 38s 22us/step - loss: 0.3654 - f1_m: 0.4450 - val_loss: 0.1811 - val_f1_m: 0.4195
Epoch 21/25
1720612/1720612 [==============================] - 38s 22us/step - loss: 0.3635 - f1_m: 0.4454 - val_loss: 0.1689 - val_f1_m: 0.4223
Epoch 22/25
1720612/1720612 [==============================] - 38s 22us/step - loss: 0.3626 - f1_m: 0.4442 - val_loss: 0.1710 - val_f1_m: 0.4216
Epoch 23/25
1720612/1720612 [==============================] - 38s 22us/step - loss: 0.3607 - f1_m: 0.4452 - val_loss: 0.1659 - val_f1_m: 0.4336
Epoch 24/25
1720612/1720612 [==============================] - 38s 22us/step - loss: 0.3590 - f1_m: 0.4467 - val_loss: 0.1723 - val_f1_m: 0.4224
Epoch 25/25
1720612/1720612 [==============================] - 38s 22us/step - loss: 0.3585 - f1_m: 0.4459 - val_loss: 0.1750 - val_f1_m: 0.4117
382359/382359 [==============================] - 15s 38us/step
Train on 1720613 samples, validate on 191180 samples
Epoch 1/25
1720613/1720613 [==============================] - 44s 25us/step - loss: 0.4286 - f1_m: 0.4272 - val_loss: 0.1840 - val_f1_m: 0.4539
Epoch 2/25
1720613/1720613 [==============================] - 43s 25us/step - loss: 0.4057 - f1_m: 0.4317 - val_loss: 0.1588 - val_f1_m: 0.4277
Epoch 3/25
1720613/1720613 [==============================] - 42s 24us/step - loss: 0.4008 - f1_m: 0.4391 - val_loss: 0.1727 - val_f1_m: 0.4280
Epoch 4/25
1720613/1720613 [==============================] - 41s 24us/step - loss: 0.3979 - f1_m: 0.4347 - val_loss: 0.1704 - val_f1_m: 0.4385
Epoch 5/25
1720613/1720613 [==============================] - 45s 26us/step - loss: 0.3948 - f1_m: 0.4369 - val_loss: 0.1723 - val_f1_m: 0.4567
Epoch 6/25
1720613/1720613 [==============================] - 42s 24us/step - loss: 0.3917 - f1_m: 0.4358 - val_loss: 0.1745 - val_f1_m: 0.4534
Epoch 7/25
1720613/1720613 [==============================] - 41s 24us/step - loss: 0.3893 - f1_m: 0.4399 - val_loss: 0.1676 - val_f1_m: 0.4330
Epoch 8/25
1720613/1720613 [==============================] - 40s 23us/step - loss: 0.3858 - f1_m: 0.4419 - val_loss: 0.1815 - val_f1_m: 0.4415
Epoch 9/25
1720613/1720613 [==============================] - 41s 24us/step - loss: 0.3832 - f1_m: 0.4394 - val_loss: 0.1722 - val_f1_m: 0.4233
Epoch 10/25
1720613/1720613 [==============================] - 41s 24us/step - loss: 0.3807 - f1_m: 0.4401 - val_loss: 0.1738 - val_f1_m: 0.4368
Epoch 11/25
1720613/1720613 [==============================] - 41s 24us/step - loss: 0.3783 - f1_m: 0.4420 - val_loss: 0.1733 - val_f1_m: 0.4362
Epoch 12/25
1720613/1720613 [==============================] - 41s 24us/step - loss: 0.3761 - f1_m: 0.4440 - val_loss: 0.1867 - val_f1_m: 0.4264
Epoch 13/25
1720613/1720613 [==============================] - 42s 24us/step - loss: 0.3738 - f1_m: 0.4429 - val_loss: 0.1790 - val_f1_m: 0.4207
Epoch 14/25
1720613/1720613 [==============================] - 41s 24us/step - loss: 0.3711 - f1_m: 0.4422 - val_loss: 0.1741 - val_f1_m: 0.4329
Epoch 15/25
1720613/1720613 [==============================] - 41s 24us/step - loss: 0.3693 - f1_m: 0.4436 - val_loss: 0.1672 - val_f1_m: 0.4466
Epoch 16/25
1720613/1720613 [==============================] - 40s 24us/step - loss: 0.3677 - f1_m: 0.4448 - val_loss: 0.1797 - val_f1_m: 0.4214
Epoch 17/25
1720613/1720613 [==============================] - 42s 24us/step - loss: 0.3662 - f1_m: 0.4416 - val_loss: 0.1813 - val_f1_m: 0.4283
Epoch 18/25
1720613/1720613 [==============================] - 45s 26us/step - loss: 0.3640 - f1_m: 0.4432 - val_loss: 0.1670 - val_f1_m: 0.4293
Epoch 19/25
1720613/1720613 [==============================] - 40s 23us/step - loss: 0.3620 - f1_m: 0.4405 - val_loss: 0.1693 - val_f1_m: 0.4272
Epoch 20/25
1720613/1720613 [==============================] - 41s 24us/step - loss: 0.3612 - f1_m: 0.4453 - val_loss: 0.1925 - val_f1_m: 0.4180
Epoch 21/25
1720613/1720613 [==============================] - 41s 24us/step - loss: 0.3588 - f1_m: 0.4471 - val_loss: 0.1744 - val_f1_m: 0.4210
Epoch 22/25
1720613/1720613 [==============================] - 41s 24us/step - loss: 0.3578 - f1_m: 0.4434 - val_loss: 0.1849 - val_f1_m: 0.4201
Epoch 23/25
1720613/1720613 [==============================] - 44s 25us/step - loss: 0.3562 - f1_m: 0.4449 - val_loss: 0.1767 - val_f1_m: 0.4328
Epoch 24/25
1720613/1720613 [==============================] - 43s 25us/step - loss: 0.3545 - f1_m: 0.4431 - val_loss: 0.1891 - val_f1_m: 0.4119
Epoch 25/25
1720613/1720613 [==============================] - 40s 23us/step - loss: 0.3538 - f1_m: 0.4428 - val_loss: 0.1858 - val_f1_m: 0.4069
382358/382358 [==============================] - 15s 40us/step
Train on 1720613 samples, validate on 191180 samples
Epoch 1/25
1720613/1720613 [==============================] - 42s 24us/step - loss: 0.4249 - f1_m: 0.4206 - val_loss: 0.1941 - val_f1_m: 0.4510
Epoch 2/25
1720613/1720613 [==============================] - 39s 23us/step - loss: 0.4061 - f1_m: 0.4308 - val_loss: 0.1929 - val_f1_m: 0.4159
Epoch 3/25
1720613/1720613 [==============================] - 40s 23us/step - loss: 0.4015 - f1_m: 0.4316 - val_loss: 0.1866 - val_f1_m: 0.4391
Epoch 4/25
1720613/1720613 [==============================] - 39s 23us/step - loss: 0.3990 - f1_m: 0.4355 - val_loss: 0.1780 - val_f1_m: 0.4355
Epoch 5/25
1720613/1720613 [==============================] - 39s 23us/step - loss: 0.3962 - f1_m: 0.4347 - val_loss: 0.1700 - val_f1_m: 0.4660
Epoch 6/25
1720613/1720613 [==============================] - 42s 24us/step - loss: 0.3936 - f1_m: 0.4363 - val_loss: 0.1729 - val_f1_m: 0.4355
Epoch 7/25
1720613/1720613 [==============================] - 41s 24us/step - loss: 0.3907 - f1_m: 0.4346 - val_loss: 0.1727 - val_f1_m: 0.4533
Epoch 8/25
1720613/1720613 [==============================] - 38s 22us/step - loss: 0.3887 - f1_m: 0.4360 - val_loss: 0.1688 - val_f1_m: 0.4492
Epoch 9/25
1720613/1720613 [==============================] - 38s 22us/step - loss: 0.3855 - f1_m: 0.4378 - val_loss: 0.1759 - val_f1_m: 0.4431
Epoch 10/25
1720613/1720613 [==============================] - 38s 22us/step - loss: 0.3830 - f1_m: 0.4372 - val_loss: 0.1669 - val_f1_m: 0.4474
Epoch 11/25
1720613/1720613 [==============================] - 38s 22us/step - loss: 0.3809 - f1_m: 0.4374 - val_loss: 0.1726 - val_f1_m: 0.4299
Epoch 12/25
1720613/1720613 [==============================] - 39s 22us/step - loss: 0.3785 - f1_m: 0.4413 - val_loss: 0.1716 - val_f1_m: 0.4421
Epoch 13/25
1720613/1720613 [==============================] - 38s 22us/step - loss: 0.3760 - f1_m: 0.4384 - val_loss: 0.1780 - val_f1_m: 0.4338
Epoch 14/25
1720613/1720613 [==============================] - 38s 22us/step - loss: 0.3735 - f1_m: 0.4370 - val_loss: 0.1786 - val_f1_m: 0.4158
Epoch 15/25
1720613/1720613 [==============================] - 40s 23us/step - loss: 0.3723 - f1_m: 0.4395 - val_loss: 0.1829 - val_f1_m: 0.4286
Epoch 16/25
1720613/1720613 [==============================] - 41s 24us/step - loss: 0.3696 - f1_m: 0.4397 - val_loss: 0.1794 - val_f1_m: 0.4441
Epoch 17/25
1720613/1720613 [==============================] - 40s 23us/step - loss: 0.3685 - f1_m: 0.4422 - val_loss: 0.1886 - val_f1_m: 0.4197
Epoch 18/25
1720613/1720613 [==============================] - 42s 24us/step - loss: 0.3671 - f1_m: 0.4422 - val_loss: 0.1682 - val_f1_m: 0.4295
Epoch 19/25
1720613/1720613 [==============================] - 41s 24us/step - loss: 0.3645 - f1_m: 0.4400 - val_loss: 0.1917 - val_f1_m: 0.4261
Epoch 20/25
1720613/1720613 [==============================] - 42s 24us/step - loss: 0.3637 - f1_m: 0.4398 - val_loss: 0.1812 - val_f1_m: 0.4255
Epoch 21/25
1720613/1720613 [==============================] - 42s 24us/step - loss: 0.3616 - f1_m: 0.4424 - val_loss: 0.1745 - val_f1_m: 0.4410
Epoch 22/25
1720613/1720613 [==============================] - 40s 23us/step - loss: 0.3606 - f1_m: 0.4415 - val_loss: 0.1717 - val_f1_m: 0.4394
Epoch 23/25
1720613/1720613 [==============================] - 40s 23us/step - loss: 0.3589 - f1_m: 0.4398 - val_loss: 0.1840 - val_f1_m: 0.4057
Epoch 24/25
1720613/1720613 [==============================] - 42s 24us/step - loss: 0.3578 - f1_m: 0.4418 - val_loss: 0.1823 - val_f1_m: 0.4338
Epoch 25/25
1720613/1720613 [==============================] - 43s 25us/step - loss: 0.3571 - f1_m: 0.4398 - val_loss: 0.1671 - val_f1_m: 0.4357
382358/382358 [==============================] - 19s 50us/step
Train on 1720613 samples, validate on 191180 samples
Epoch 1/25
1720613/1720613 [==============================] - 43s 25us/step - loss: 0.4257 - f1_m: 0.4166 - val_loss: 0.1763 - val_f1_m: 0.4408
Epoch 2/25
1720613/1720613 [==============================] - 38s 22us/step - loss: 0.4065 - f1_m: 0.4295 - val_loss: 0.2048 - val_f1_m: 0.4348
Epoch 3/25
1720613/1720613 [==============================] - 39s 23us/step - loss: 0.4019 - f1_m: 0.4310 - val_loss: 0.1736 - val_f1_m: 0.4390
Epoch 4/25
1720613/1720613 [==============================] - 46s 27us/step - loss: 0.3986 - f1_m: 0.4323 - val_loss: 0.1642 - val_f1_m: 0.4460
Epoch 5/25
1720613/1720613 [==============================] - 44s 26us/step - loss: 0.3967 - f1_m: 0.4357 - val_loss: 0.1705 - val_f1_m: 0.4130
Epoch 6/25
1720613/1720613 [==============================] - 45s 26us/step - loss: 0.3940 - f1_m: 0.4366 - val_loss: 0.1769 - val_f1_m: 0.4288
Epoch 7/25
1720613/1720613 [==============================] - 46s 27us/step - loss: 0.3913 - f1_m: 0.4387 - val_loss: 0.1633 - val_f1_m: 0.4447
Epoch 8/25
1720613/1720613 [==============================] - 47s 27us/step - loss: 0.3887 - f1_m: 0.4378 - val_loss: 0.1808 - val_f1_m: 0.4265
Epoch 9/25
1720613/1720613 [==============================] - 48s 28us/step - loss: 0.3860 - f1_m: 0.4403 - val_loss: 0.1669 - val_f1_m: 0.4344
Epoch 10/25
1720613/1720613 [==============================] - 47s 27us/step - loss: 0.3831 - f1_m: 0.4376 - val_loss: 0.1671 - val_f1_m: 0.4508
Epoch 11/25
1720613/1720613 [==============================] - 42s 25us/step - loss: 0.3804 - f1_m: 0.4356 - val_loss: 0.1699 - val_f1_m: 0.4216
Epoch 12/25
1720613/1720613 [==============================] - 44s 26us/step - loss: 0.3787 - f1_m: 0.4375 - val_loss: 0.1764 - val_f1_m: 0.4286
Epoch 13/25
1720613/1720613 [==============================] - 43s 25us/step - loss: 0.3769 - f1_m: 0.4415 - val_loss: 0.1781 - val_f1_m: 0.4421
Epoch 14/25
1720613/1720613 [==============================] - 44s 26us/step - loss: 0.3744 - f1_m: 0.4376 - val_loss: 0.1625 - val_f1_m: 0.4385
Epoch 15/25
1720613/1720613 [==============================] - 44s 25us/step - loss: 0.3728 - f1_m: 0.4397 - val_loss: 0.1617 - val_f1_m: 0.4383
Epoch 16/25
1720613/1720613 [==============================] - 42s 25us/step - loss: 0.3706 - f1_m: 0.4430 - val_loss: 0.1611 - val_f1_m: 0.4382
Epoch 17/25
1720613/1720613 [==============================] - 43s 25us/step - loss: 0.3689 - f1_m: 0.4367 - val_loss: 0.1699 - val_f1_m: 0.4208
Epoch 18/25
1720613/1720613 [==============================] - 41s 24us/step - loss: 0.3670 - f1_m: 0.4390 - val_loss: 0.1682 - val_f1_m: 0.4272
Epoch 19/25
1720613/1720613 [==============================] - 41s 24us/step - loss: 0.3659 - f1_m: 0.4403 - val_loss: 0.1683 - val_f1_m: 0.4117
Epoch 20/25
1720613/1720613 [==============================] - 44s 26us/step - loss: 0.3636 - f1_m: 0.4431 - val_loss: 0.1631 - val_f1_m: 0.4327
Epoch 21/25
1720613/1720613 [==============================] - 47s 28us/step - loss: 0.3619 - f1_m: 0.4449 - val_loss: 0.1734 - val_f1_m: 0.4124
Epoch 22/25
1720613/1720613 [==============================] - 41s 24us/step - loss: 0.3612 - f1_m: 0.4437 - val_loss: 0.1703 - val_f1_m: 0.4336
Epoch 23/25
1720613/1720613 [==============================] - 40s 23us/step - loss: 0.3598 - f1_m: 0.4485 - val_loss: 0.1654 - val_f1_m: 0.4383
Epoch 24/25
1720613/1720613 [==============================] - 40s 23us/step - loss: 0.3594 - f1_m: 0.4468 - val_loss: 0.1760 - val_f1_m: 0.4207
Epoch 25/25
1720613/1720613 [==============================] - 40s 23us/step - loss: 0.3578 - f1_m: 0.4460 - val_loss: 0.1694 - val_f1_m: 0.4155
382358/382358 [==============================] - 17s 45us/step

10. Results

10.1 Results for every fold.

In [112]:
sr_acc = 0
sr_mcc = 0
print('Results for every fold: ' + '\n\n')
for i, j in enumerate(models_results):
    print('-- Fold',i + 1,'--\n')
    print('F1 score:',str(round(j[1], 4) * 100) + '%')
    print('Loss:',round(j[0], 4))
    print('\n' + 'Confusion matrix:' + '\n',j[2])
    print('\n' + 'Precision:', str(round(j[2][0][0] * 100 / (j[2][0][0] + j[2][1][0]),2)) + '%')
    print('Sensitivity(Recall):', str(round(j[2][0][0] * 100 / (j[2][0][0] + j[2][0][1]), 2)) + '%')
    ba_acc = balanced_accuracy_score(j[5], j[4])
    print('\n' + 'Balanced Accuracy Score:(=AUC)', ba_acc )
    sr_acc += ba_acc
    mcc = matthews_corrcoef(j[5], j[4])
    print('\n' + 'MCC:', mcc)
    sr_mcc += mcc
    print('\n',j[3])
    print('\n\n')
Results for every fold: 


-- Fold 1 --

F1 score: 14.19%
Loss: 0.1807

Confusion matrix:
 [[360415   9483]
 [  6618   5843]]

Precision: 98.2%
Sensitivity(Recall): 97.44%

Balanced Accuracy Score:(=AUC) 0.7216330900590151

MCC: 0.40125002217422956

               precision    recall  f1-score   support

         0.0       0.98      0.97      0.98    369898
         1.0       0.38      0.47      0.42     12461

    accuracy                           0.96    382359
   macro avg       0.68      0.72      0.70    382359
weighted avg       0.96      0.96      0.96    382359




-- Fold 2 --

F1 score: 13.850000000000001%
Loss: 0.1742

Confusion matrix:
 [[361789   8109]
 [  6652   5809]]

Precision: 98.19%
Sensitivity(Recall): 97.81%

Balanced Accuracy Score:(=AUC) 0.722126102339374

MCC: 0.4211866905334542

               precision    recall  f1-score   support

         0.0       0.98      0.98      0.98    369898
         1.0       0.42      0.47      0.44     12461

    accuracy                           0.96    382359
   macro avg       0.70      0.72      0.71    382359
weighted avg       0.96      0.96      0.96    382359




-- Fold 3 --

F1 score: 13.86%
Loss: 0.175

Confusion matrix:
 [[360207   9691]
 [  6475   5986]]

Precision: 98.23%
Sensitivity(Recall): 97.38%

Balanced Accuracy Score:(=AUC) 0.7270898337243855

MCC: 0.4066950182908264

               precision    recall  f1-score   support

         0.0       0.98      0.97      0.98    369898
         1.0       0.38      0.48      0.43     12461

    accuracy                           0.96    382359
   macro avg       0.68      0.73      0.70    382359
weighted avg       0.96      0.96      0.96    382359




-- Fold 4 --

F1 score: 14.430000000000001%
Loss: 0.1863

Confusion matrix:
 [[359035  10863]
 [  6436   6024]]

Precision: 98.24%
Sensitivity(Recall): 97.06%

Balanced Accuracy Score:(=AUC) 0.727049769661459

MCC: 0.3924174954157874

               precision    recall  f1-score   support

         0.0       0.98      0.97      0.98    369898
         1.0       0.36      0.48      0.41     12460

    accuracy                           0.95    382358
   macro avg       0.67      0.73      0.69    382358
weighted avg       0.96      0.95      0.96    382358




-- Fold 5 --

F1 score: 13.48%
Loss: 0.1678

Confusion matrix:
 [[361976   7922]
 [  6715   5745]]

Precision: 98.18%
Sensitivity(Recall): 97.86%

Balanced Accuracy Score:(=AUC) 0.7198293632671823

MCC: 0.4204824219019086

               precision    recall  f1-score   support

         0.0       0.98      0.98      0.98    369898
         1.0       0.42      0.46      0.44     12460

    accuracy                           0.96    382358
   macro avg       0.70      0.72      0.71    382358
weighted avg       0.96      0.96      0.96    382358




-- Fold 6 --

F1 score: 13.74%
Loss: 0.1691

Confusion matrix:
 [[360900   8997]
 [  6589   5872]]

Precision: 98.21%
Sensitivity(Recall): 97.57%

Balanced Accuracy Score:(=AUC) 0.7234536255668396

MCC: 0.41046034046509616

               precision    recall  f1-score   support

         0.0       0.98      0.98      0.98    369897
         1.0       0.39      0.47      0.43     12461

    accuracy                           0.96    382358
   macro avg       0.69      0.72      0.70    382358
weighted avg       0.96      0.96      0.96    382358





10.2 Averge results

10.2.1 Averge confusion matrix.

In [113]:
sm = models_results[0][2] + models_results[1][2] + models_results[2][2] + models_results[3][2] + models_results[4][2] + models_results[5][2]
av_cm = sm//6
In [114]:
ax= plt.subplot()
sn.heatmap(av_cm, annot=True, fmt='g', cmap="Blues", ax = ax)
ax.set_xlabel('True labels');ax.set_ylabel('Predicted labels') 
ax.set_title('Confusion Matrix')
ax.xaxis.set_ticklabels(['0', '1'])
ax.yaxis.set_ticklabels(['0', '1'])
Out[114]:
[Text(0, 0.5, '0'), Text(0, 1.5, '1')]

10.2.2 Averge results for metrics.

In [121]:
print('Averge results:')
average = str(round((models_results[0][1] + models_results[1][1] + models_results[2][1] + models_results[3][1] + 
                     models_results[4][1] + models_results[5][1]) / 6, 4)*100) + '%'
print('\n' + 'F1 score:', average)
print('Precision:', str(round(av_cm[0][0] * 100 / (av_cm[0][0] + av_cm[1][0]), 2)) + '%')
print('Sensitivity(Recall):', str(round(av_cm[0][0] * 100 / (av_cm[0][0] + av_cm[0][1]),2)) + '%')
print('Mean balanced accuracy score(=AUC):', str(round(sr_acc / 6, 4)))
print('Mean MCC:', str(round(sr_mcc / 6, 4)))
Averge results:

F1 score: 13.930000000000001%
Precision: 98.21%
Sensitivity(Recall): 97.52%
Mean balanced accuracy score(=AUC): 0.7235
Mean MCC: 0.4087

11. Summary

Recall Precision balanced accuracy score MCC F1-SCORE
Frist NN 98.66% 97.89% 0.68 0.40 23%
Second NN 99.08% 97.95% 0.69 0.46 25%
Last NN (averge results) 97.52% 98.21% 0.7235 0.41 13.93%

12. Conclusions

  • Data I was working on is not easy to deal with. There should be a lot more work to do with this, and variables do not describe good and bad genotype well.
  • Best result gave second Neural Network with weigths and dropout method.
  • In first model occur overfitting, so results wasn't really good.
  • Low F1 score in third NN is due to high weigths I used.
  • StratifiedKfold could be a good option, but it takes a lot of time to calculate.

References